javapdfpdfboxtagged-pdf

Java PDFBox: Remove the parent element in tagged PDF


I got the Possibly inappropriate use of a "Table" structure element warning in PAC3 validation. Table structure in PDF as followsenter image description here,

In order to pass the PAC3 I now drag out the tables from the parent tag to an element on its own as like below,

example

I tried the below code, but it didn't work

PDStructureElement parent=(PDStructureElement)element.getParent();

//parent.setStructureType(StandardStructureTypes.TABLE);
element.insertBefore(element,parent);
element.setParent(parent.getParent());

Please help me with this.


Solution

  • The main issue in the code you show is that you try to insert the element as kid of itself instead of as kid of its current grand parent:

    element.insertBefore(element,parent);
    

    You can make it work like this:

    if (element instanceof PDStructureElement) {
        PDStructureElement pdStructureElement = (PDStructureElement) element;
        if ("Table".equals(pdStructureElement.getStructureType())) {
            PDStructureNode parentNode = pdStructureElement.getParent();
            if (parentNode instanceof PDStructureElement) {
                PDStructureElement parent = (PDStructureElement) parentNode;
                PDStructureNode newParentNode = parent.getParent();
                if (newParentNode != null) {
                    newParentNode.insertBefore(pdStructureElement, parent);
                    pdStructureElement.setParent(newParentNode);
                    newParentNode.removeKid(parent);
                }
            }
        }
    }
    

    (from MoveInStructureTree helper method checkAndMoveTableUp)

    Applying this recursively to the structure tree of your PDF removes the Possibly inappropriate use of a "Table" structure element warning in PAC3 validation, cf. the MoveInStructureTree test testMoveTableUpTradeSimple1.

    (This code assumes that like in your example document all Table elements are embedded as single kid in a parent element to replace. For other cases you have to add some sanity checks and probably special treatment of other cases.)