What I am trying to do is update form fields in a PDF with a value and then mark them as read-only so that they cannot be further edited.
I have tried two approaches. use
pdf.bake()
This did not work for my use case, as post filling in data in the PDF I am doing an e-signature on the PDF. Now, if I use bake, all the signs show up on the first page when they are distributed across different pages.
The second approach I tried is marking the widget read-only
pdf_document = fitz.open("pdf", pdf_stream)
READONLY_FLAG = 1 << 0
processed_fields = set()
for index, page in enumerate(pdf_document):
for field in page.widgets():
flag_image = False
if not flag_image:
if field.field_name not in processed_fields:
print(f"non repeated name {field.field_name}, page {index}")
processed_fields.add(field.field_name)
field.field_flags |= READONLY_FLAG
field.update()
This works fine for most of the fields Except for fields that are repeated on multiple pages or on the same page. These fields are still editable. Other fields become non-editable.
How to account for repeated fields as well?
Sample pdf file. https://filebin.net/orhtcmxp8c3b9cvh
Inside a PDF, field objects are stored as trees. In your case, there are 2 trees and 4 nodes:
Object 11 (text_1)
- Object 8
- Object 10
Object 12 (text_2)
Objects 8 and 10 have no name: it means that they are different representations of the same actual field (which is their parent). In that case, the flags must be stored at the parent level. The code that you showed doesn't change the parent object (only the leaves), which is why it doesn't work.
You need more low-level code to properly change the fields to read-only. Here is a way to do it:
import pymupdf
from pymupdf import mupdf
def set_readonly_field(field_obj):
flags = mupdf.pdf_dict_get_int(field_obj, pymupdf.PDF_NAME("Ff"))
flags |= pymupdf.PDF_FIELD_IS_READ_ONLY
mupdf.pdf_dict_put_int(field_obj, pymupdf.PDF_NAME("Ff"), flags)
doc = pymupdf.open("multi_form_field.pdf")
for page in doc:
for field in page.widgets():
field_obj = mupdf.pdf_annot_obj(field._annot)
print("Processing object %d" % field_obj.pdf_to_num())
name = mupdf.pdf_dict_get_string(field_obj, pymupdf.PDF_NAME("T"))
if name[0] == "":
# This object has no name, so we're looking at its parent
parent_obj = mupdf.pdf_dict_get(field_obj, pymupdf.PDF_NAME("Parent"))
if parent_obj.pdf_to_num() != 0:
print("Found parent %d" % parent_obj.pdf_to_num())
set_readonly_field(parent_obj)
else:
set_readonly_field(field_obj)
doc.save("output.pdf")
Output:
Processing object 8
Found parent 11
Processing object 10
Found parent 11
Processing object 12
The original PDF contains:
11 0 obj
<</FT/Tx/Ff 0/Kids[8 0 R 10 0 R]/T(text_1)/V(text 1)>>
endobj
The generated PDF contains:
11 0 obj
<</FT/Tx/Ff 1/Kids[8 0 R 10 0 R]/T(text_1)/V(text 1)>>
endobj
Note that the /Ff
entry (which stores the flags) changed from 0
to 1
for object 11, which means that field
text_1
is now read-only.