javaitextform-fields

Not able extract all the tags from pdf using iText-8 getAllFormField Method


As per the iText 8 version migration from iText 5.5.

We are unable to extract the tag from PDF by using new iText 8 code but able to extract using old iText 5.5 code.

Below is the code to get the tags.

As per New Code

PdfAcroForm pdfAcroForm = PdfAcroForm.getAcroForm(pdfDocument, true);
 or
PdfAcroForm pdfAcroForm = PdfFormCreator.getAcroForm(pdfDocument, true);
Map<String, PdfFormField> formFieldMap = pdfAcroForm.getAllFormFields();

As per old code:

reader = new com.itextpdf.text.pdf.PdfReader(docdByteArray);
PdfStamper stamper = new PdfStamper(reader, baos);
AcroFields acroFields = stamper.getAcroFields();

We also tried Apache PDFBox but we are getting the same result which we got using new iText 8.

The issue happens with only one or two PDFs. We have one other PDF with same tag and we are able extract that tag using new and old code.

We are able to extract that tag from original template but after prefill we failed.

In my case is the PDF file corrupted or is fields structure disturbed or any other issue?

Not able to attached PDF docs because it's confidential document.

We are expecting the same result that we got using iText 5.5 version from original and prefilled PDFs.


Solution

  • The main difference between iText5 and 8 in terms of handling acroform fields is that

    Also note that widget annotation dictionaries may contain form field's data.

    So here the difference comes, if your annotation is present only on the page and is not referenced in acro form dictionary by any mean (neither directly in fields array nor as a child of any field), iText 8 will not treat it as a form field related annotation (in accordance to the pdf spec) and will not return it.

    You can try to catch it though on your own and add to the acro form, and thus fix the document. The code would be as follows:

    private Map<String, PdfFormField> populateFields(PdfDocument doc) {
        Map<String, PdfFormField> fields = new HashMap<>();
        for (int i = 1; i <= doc.getNumberOfPages(); ++i) {
            PdfPage page = doc.getPage(i);
            List<PdfAnnotation> annotations = page.getAnnotations();
            for (PdfAnnotation annot : annotations) {
                if (!annot.getSubtype().equals(PdfName.Widget)) {
                    continue;
                }
                PdfFormField field = PdfFormField.makeFormField(annot.getPdfObject(), doc);
                if (field != null) {
                    fields.put(field.getFieldName().toUnicodeString(), field);
                }
            }
        }
    
        return fields;
    }
    

    Note that this code doesn't take acro form fields hierarchy into account, only plain list of widget annotations which might be or might not be referenced from the acro form. So if you find anything unique in the fields map returned by this method comparing to PdfAcroForm.getAllFormFields(), you can add it into acro form using PdfAcroForm.addField.