I am writing a python script that needs to pull the data filled in a PDF form as part of a larger script. I tried using pyPDF3 but while it can show me the strings in the form, it does not show the filled-in data. I have a form where I have entered the value 'XXX" into a field and I want the script to be able to return that data and the name of the field but I can't seem to read the data. The fillpdfs module is very helpful but AFAICT it can return the field names but not the data. I have this snippet:
from PyPDF3 import PdfFileWriter, PdfFileReader
# Open the PDF file
pdf_file = open('filename.pdf', 'rb')
pdf_reader = PdfFileReader(pdf_file)
# Extract text data from each page
for page_num in range(pdf_reader.numPages):
page = pdf_reader.getPage(page_num)
'XXX' in page.extractText()
There is a function for pdf forms:
dictionary = pdf_reader.getFormTextFields() # returns a python dictionary
print(dictionary)