javapdfaccessibilitypdfboxtagged-pdf

PDFBox 3.2: Missing Bounding Box Error in PAC for Image Tag in PDF/UA


I'm trying to create a PDF/UA document using PDFBox 3.2, and I’ve followed the solution suggested by @Tilman Hausherr in this Stack Overflow post. I managed to tag both text elements and images, and the image appears correctly tagged in PAC. However, I’m still getting an error in PAC indicating a missing bounding box for the image.

Here’s what I’ve tried so far to resolve this issue:

Marked Content for Rectangle: I created marked content for the image’s rectangle and added it to the document. (No success, as the error persisted).
Adding COSName.BBOX to Figure Structure Element: I added a new item COSName.BBOX with a Rectangle(x, y, width, height) to the figure structure element. (Resulted in a corrupted PDF).
Adding COSName.BBOX to Figure Reference: I added a new item with COSName.BBOX in the figure reference, similar to step 2. (Also resulted in a corrupted PDF).

Despite these efforts, I still don’t see anything in the structure representing the bounding box when I attempt to convert my PDF to PDF/UA. Any guidance on what I might be missing to correctly define the bounding box for the image in PDFBox 3.2 would be greatly appreciated! here is Code für Image creation and Tagging :

COSDictionary markedContentDictionary3 = new COSDictionary();
        markedContentDictionary3.setInt(COSName.MCID, mcidCounter + 2);
        markedContentDictionary3.setString(COSName.ALT, "Alternate Image Description");

        PDMarkedContentReference mcr3 = new PDMarkedContentReference();
        mcr3.setMCID(mcidCounter + 2);

        //COSDictionary markedContentDictionary4 = new COSDictionary();
        //markedContentDictionary4.setInt(COSName.MCID, mcidCounter + 3);
        //PDMarkedContentReference mcr4 = new PDMarkedContentReference();
        //mcr4.setMCID(mcidCounter + 3);

        contentStream.beginMarkedContent(COSName.IMAGE, PDPropertyList.create(markedContentDictionary3));
        contentStream.drawImage(image, x, y, width, height);
        contentStream.endMarkedContent();
        // Schließen des Inhaltsstroms
        contentStream.close();

        PDStructureElement figureElement = new PDStructureElement(StandardStructureTypes.Figure, documentElement);
        figureElement.setPage(page);
        figureElement.setAlternateDescription("Dieses Bild zeigt: <dein_Tag_oder_Beschriftung>");

        figureElement.appendKid(mcr3);

        documentElement.appendKid(figureElement);

enter image description here


Solution

  • assign a number to the image:

    image.setStructParent(structParentCounter + 1);
    

    include the figure element in the parent tree, and assign it an attribute

    PDStructureElement figureElement = new PDStructureElement(StandardStructureTypes.Figure, documentElement);
    PDLayoutAttributeObject attributeObject = new PDLayoutAttributeObject();
    attributeObject.setBBox(new PDRectangle(x, y, width, height));
    attributeObject.setPlacement(PDLayoutAttributeObject.PLACEMENT_BLOCK);
    figureElement.addAttribute(attributeObject);
    figureElement.setPage(page);
    figureElement.setAlternateDescription("Dieses Bild zeigt: <dein_Tag_oder_Beschriftung>");
    PDMarkedContentReference mcr3 = new PDMarkedContentReference();
    mcr3.setMCID(mcidCounter + 2);
    figureElement.appendKid(mcr3);
    documentElement.appendKid(figureElement);
    parentTreeMap.put(structParentCounter + 1, figureElement);
    // add to the array from SO 79126664, the 0-based index = MCID
    ar.add(null); // because you have an MCID "1" about which I know nothing about
    ar.add(figureElement); 
    

    also don't forget to call

    structureTreeRoot.setParentTreeNextKey()
    

    with the highest value plus 1.