pdfpdftkfdf

excluding invisible fields from pdftk


I'm using /usr/bin/pdftk filename.pdf dump_data_fields output - flatten to get the FDF fields in a PDF but it seems to be including invisible FDF fields as well.

https://docdro.id/nriB59b is a one-page PDF without any txt but with a number of these invisible FDF fields. pdftk's output can be seen at https://pastebin.com/ag6vweNP.

How can I exclude invisible FDF fields?

I'm currently using pdftk but I'm open to using other tools as well.

Thanks!


Solution

  • My guess is you have to inspect the PDF yourself to detect if or not a field is invisible. In another side, it may become very tricky to tell if a field is invisible or not, except if a flag sets this.

    For example, although I don't know if it's possible, but let say a field is outside the page or covered by another content... Is it visible or not?

    By the way, you can use qpdf to inspect the content of a PDF file. The following command will decompress your pdf to get it human readable.

    qpdf --qdf --object-streams=disable orig.pdf uncompressed-qpdf.pdf
    

    If you prefer a JSON representation:

    qpdf --json your_pdf.pdf > your_pdf.json
    

    If you go for the later one, you can parse the json output with jq.

    Then, use the PDF speficication you want to apply. I suggest also these steps: