javapdfitext7pdf-annotations

PdfPopupAnnotation parent not accessible after copying page to other document


I'm copying annotated PDF pages from one document to another. The odd thing I'm experiencing is that in the new document, I can't access the parent of the PdfPopupAnnotations:

public class CopyPdfTest {
    public static void main(String[] args) throws IOException {
        PdfDocument inputDoc = new PdfDocument(new PdfReader("src/test/resources/input.pdf"));
        PdfDocument outputDoc = new PdfDocument(new PdfWriter("/tmp/output.pdf"));

        // Copy pages
        for (int i = 1; i <= inputDoc.getNumberOfPages(); i++) {
            inputDoc.copyPagesTo(i, i, outputDoc);
        }

        // Re-open outputDoc to eliminate the possibility the problem stems from
        // it being opened in writing mode
        outputDoc.close();
        outputDoc = new PdfDocument(new PdfReader("/tmp/output.pdf"));

        // Step through the PdfPopupAnnotations in both documents and check for their parents
        for (PdfDocument doc : new PdfDocument[] { inputDoc, outputDoc } ) {
            for (int i = 1; i <= inputDoc.getNumberOfPages(); i++) {
                for (PdfAnnotation annot : doc.getPage(i).getAnnotations()) {
                    if (annot instanceof PdfPopupAnnotation) {
                        // This prints null for popups from the outputDoc
                        System.out.println(((PdfPopupAnnotation) annot).getParentObject());
                    }
                }
            }
        }
    }
}

This results in the following output when processing a PDF with one /Square annotation (first line prints popup annotation parent from original PDF, second line prints null for output PDF):

<</AP <</N 10 0 R >> /C [0.898026 0.133331 0.215683 ] /Contents test /CreationDate D:20180107105025+01'00' /F 4 /M D:20180107105029+01'00' /NM 8a233cc7-ed2f-48bf-91f2-a46cecf15160 /P 9 0 R /Popup 16 0 R /RC <?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:18.9.0" xfa:spec="2.0.2" ><p dir="ltr"><span dir="ltr" style="font-size:10.5pt;text-align:left;color:#000000;font-weight:normal;font-style:normal">test</span></p></body> /RD [0.5 0.5 0.5 0.5 ] /Rect [84.7495 636.205 191.876 764.21 ] /Subj Rectangle /Subtype /Square /T tom /Type /Annot >>
null

I find this especially odd as looking at an uncompressed example PDF, the parent reference 4 0 R remains intact, and the referenced /Square annotation remains present as 4 0 obj.

input.pdf

%PDF-1.4
%âãÏÓ
5 0 obj 
<<
/M (D:20180107100338+01'00')
/NM (68bd5c7e-3071-4b10-83ad-6bb2e75a8f3d)
/Subtype /Popup
/Type /Annot
/Parent 4 0 R
/Open false
/F 28
/Rect [352.966 707.883 532.966 827.883]
/P 3 0 R
>>
endobj 
6 0 obj 
<<
/FormType 1
/Subtype /Form
/Type /XObject
/BBox [115.975 693.768 179.508 827.883]
/Length 69
/Matrix [1 0 0 1 -115.975 -693.768]
>>
stream
1.000 0.000 0.000 RG
2 w
0 J
0 j
116.975 694.768 61.534 132.115 re
S

endstream 
endobj 
4 0 obj 
<<
/Subtype /Square
/RD [0 0 0 0]
/RC (<?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:11.0.0" xfa:spec="2.0.2"><p dir="ltr"><span style="text-align:left;font-size:13pt;font-style:normal;font-weight:normal;color:#000000;font-family:Arial">test</span></p></body>)
/T (thw)
/Contents (test)
/Rect [115.975 693.768 179.508 827.883]
/CA 1
/P 3 0 R
/M (D:20180107100342+01'00')
/Type /Annot
/NM (fd33d765-e844-4226-aff8-3ef81361e787)
/F 4
/BS 
<<
/W 2
/S /S
>>
/AP 
<<
/N 6 0 R
>>
/C [1 0 0]
/Popup 5 0 R
/Subj (Rectangle)
/CreationDate (D:20180107100338+01'00')
>>
endobj 
8 0 obj 
<<
/OPM 1
/Type /ExtGState
>>
endobj 
7 0 obj 
<<
/R7 8 0 R
>>
endobj 
9 0 obj 
<<
/Length 30
>>
stream
q 0.1 0 0 0.1 0 0 cm
/R7 gs
Q

endstream 
endobj 
3 0 obj 
<<
/pdftk_PageNum 1
/Annots [4 0 R 5 0 R]
/Resources 
<<
/ProcSet [/PDF]
/ExtGState 7 0 R
>>
/Type /Page
/Parent 1 0 R
/Contents 9 0 R
/MediaBox [0 0 595 842]
>>
endobj 
1 0 obj 
<<
/Kids [3 0 R]
/Type /Pages
/Count 1
>>
endobj 
11 0 obj 
<<
/Type /Catalog
/Pages 1 0 R
>>
endobj 
12 0 obj 
<<
/ModDate (D:20180107101601+01'00')
/CreationDate (D:20180107101601+01'00')
/Creator (pdftk 2.02 - www.pdftk.com)
/Producer (itext-paulo-155 \(itextpdf.sf.net-lowagie.com\))
>>
endobj xref
0 13
0000000000 65535 f 
0000001472 00000 n 
0000000000 65535 f 
0000001293 00000 n 
0000000460 00000 n 
0000000015 00000 n 
0000000220 00000 n 
0000001177 00000 n 
0000001130 00000 n 
0000001210 00000 n 
0000000000 65535 f 
0000001531 00000 n 
0000001583 00000 n 
trailer

<<
/Info 12 0 R
/ID [<23bde7d1ea6b4f52b55dc534b36f8d41><e031fe688c87cb2303e0a99487c3025e>]
/Root 11 0 R
/Size 13
>>
startxref
1779
%%EOF

output.pdf

%PDF-1.7
%âãÏÓ
5 0 obj 
<<
/M (D:20180107100338+01'00')
/NM (68bd5c7e-3071-4b10-83ad-6bb2e75a8f3d)
/Subtype /Popup
/Type /Annot
/Parent 4 0 R
/Open false
/F 28
/Rect [352.966 707.883 532.966 827.883]
/P 3 0 R
>>
endobj 
6 0 obj 
<<
/FormType 1
/Subtype /Form
/Type /XObject
/BBox [115.975 693.768 179.508 827.883]
/Length 69
/Matrix [1 0 0 1 -115.975 -693.768]
>>
stream
1.000 0.000 0.000 RG
2 w
0 J
0 j
116.975 694.768 61.534 132.115 re
S

endstream 
endobj 
4 0 obj 
<<
/Subtype /Square
/RD [0 0 0 0]
/RC (<?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:11.0.0" xfa:spec="2.0.2"><p dir="ltr"><span style="text-align:left;font-size:13pt;font-style:normal;font-weight:normal;color:#000000;font-family:Arial">test</span></p></body>)
/T (thw)
/Contents (test)
/Rect [115.975 693.768 179.508 827.883]
/CA 1
/P 3 0 R
/M (D:20180107100342+01'00')
/Type /Annot
/NM (fd33d765-e844-4226-aff8-3ef81361e787)
/F 4
/BS 
<<
/W 2
/S /S
>>
/AP 
<<
/N 6 0 R
>>
/C [1 0 0]
/Popup 5 0 R
/Subj (Rectangle)
/CreationDate (D:20180107100338+01'00')
>>
endobj 
7 0 obj 
<<
/M (D:20180107100338+01'00')
/NM (68bd5c7e-3071-4b10-83ad-6bb2e75a8f3d)
/Subtype /Popup
/Open false
/Type /Annot
/F 28
/Rect [352.966 707.883 532.966 827.883]
/P 3 0 R
>>
endobj 
9 0 obj 
<<
/OPM 1
/Type /ExtGState
>>
endobj 
8 0 obj 
<<
/R7 9 0 R
>>
endobj 
10 0 obj 
<<
/Length 30
>>
stream
q 0.1 0 0 0.1 0 0 cm
/R7 gs
Q

endstream 
endobj 
3 0 obj 
<<
/pdftk_PageNum 1
/Annots [4 0 R 7 0 R]
/Resources 
<<
/ProcSet [/PDF]
/ExtGState 8 0 R
>>
/Contents 10 0 R
/Parent 1 0 R
/Type /Page
/MediaBox [0 0 595 842]
>>
endobj 
1 0 obj 
<<
/Kids [3 0 R]
/Type /Pages
/Count 1
>>
endobj 
12 0 obj 
<<
/Type /Catalog
/Pages 1 0 R
>>
endobj 
13 0 obj 
<<
/ModDate (D:20180107101427+01'00')
/CreationDate (D:20180107101427+01'00')
/Creator (pdftk 2.02 - www.pdftk.com)
/Producer (itext-paulo-155 \(itextpdf.sf.net-lowagie.com\))
>>
endobj xref
0 14
0000000000 65535 f 
0000001665 00000 n 
0000000000 65535 f 
0000001485 00000 n 
0000000460 00000 n 
0000000015 00000 n 
0000000220 00000 n 
0000001130 00000 n 
0000001368 00000 n 
0000001321 00000 n 
0000001401 00000 n 
0000000000 65535 f 
0000001724 00000 n 
0000001776 00000 n 
trailer

<<
/Info 13 0 R
/ID [<09baf689039bb6015d4c428111e4ee72><684b5613b1931e88255384276dcaceb1>]
/Root 12 0 R
/Size 14
>>
startxref
1972
%%EOF

Any hints to why this is so and how to make the parent accessible by iText?


Solution

  • Unfortunately the OP did not provide the PDF files in binary form, so I could not simply check the following; looking at the data, though, a difference is obvious...

    The popup object in your input.pdf has a Parent entry:

    5 0 obj 
    <<
    /M (D:20180107100338+01'00')
    /NM (68bd5c7e-3071-4b10-83ad-6bb2e75a8f3d)
    /Subtype /Popup
    /Type /Annot
    /Parent 4 0 R
    /Open false
    /F 28
    /Rect [352.966 707.883 532.966 827.883]
    /P 3 0 R
    >>
    endobj
    

    In your output.pdf, on the other hand, the popup object doesn't:

    7 0 obj 
    <<
    /M (D:20180107100338+01'00')
    /NM (68bd5c7e-3071-4b10-83ad-6bb2e75a8f3d)
    /Subtype /Popup
    /Open false
    /Type /Annot
    /F 28
    /Rect [352.966 707.883 532.966 827.883]
    /P 3 0 R
    >>
    

    This also matches the iText 7 code of the getParent method:

    public PdfDictionary getParentObject() {
        return getPdfObject().getAsDictionary(PdfName.Parent);
    }
    
    public PdfAnnotation getParent() {
        if (parent == null) {
            parent = makeAnnotation(getParentObject());
        }
        return parent;
    }
    

    Thus, to make the parent accessible by iText, make sure the popup annotation has a Parent entry!


    Yes, I know, the Parent entry is optional. But the getParent does not claim that it determines the actual parent object, it merely returns the object referenced by the Parent entry.


    Another issue in your output.pdf:

    When analysing the files you probably did not look at the annotations of the page but merely at the pop-up/parent relations between annotation objects and, therefore, thought your pop-up has a parent entry...