I'm using /usr/bin/pdftk filename.pdf dump_data_fields output - flatten
to get the FDF fields in a PDF but it seems to be including invisible FDF fields as well.
https://docdro.id/nriB59b is a one-page PDF without any txt but with a number of these invisible FDF fields. pdftk's output can be seen at https://pastebin.com/ag6vweNP.
How can I exclude invisible FDF fields?
I'm currently using pdftk but I'm open to using other tools as well.
Thanks!
My guess is you have to inspect the PDF yourself to detect if or not a field is invisible. In another side, it may become very tricky to tell if a field is invisible or not, except if a flag sets this.
For example, although I don't know if it's possible, but let say a field is outside the page or covered by another content... Is it visible or not?
By the way, you can use qpdf
to inspect the content of a PDF file. The following command will decompress your pdf to get it human readable.
qpdf --qdf --object-streams=disable orig.pdf uncompressed-qpdf.pdf
If you prefer a JSON representation:
qpdf --json your_pdf.pdf > your_pdf.json
If you go for the later one, you can parse the json output with jq
.
Then, use the PDF speficication you want to apply. I suggest also these steps:
diff
.