pythonpdfpymupdfmupdf

mupdf mark field form widgets as read only


What I am trying to do is update form fields in a PDF with a value and then mark them as read-only so that they cannot be further edited.

I have tried two approaches. use

pdf.bake()

This did not work for my use case, as post filling in data in the PDF I am doing an e-signature on the PDF. Now, if I use bake, all the signs show up on the first page when they are distributed across different pages.

The second approach I tried is marking the widget read-only

  pdf_document = fitz.open("pdf", pdf_stream)
  READONLY_FLAG = 1 << 0 
  processed_fields = set()
  for index, page in enumerate(pdf_document):
    for field in page.widgets():
      flag_image = False
      if not flag_image:
        if field.field_name not in processed_fields:
          print(f"non repeated name {field.field_name}, page {index}")
          processed_fields.add(field.field_name)
          field.field_flags |= READONLY_FLAG
          field.update()

This works fine for most of the fields Except for fields that are repeated on multiple pages or on the same page. These fields are still editable. Other fields become non-editable.

How to account for repeated fields as well?

Sample pdf file. https://filebin.net/orhtcmxp8c3b9cvh


Solution

  • Inside a PDF, field objects are stored as trees. In your case, there are 2 trees and 4 nodes:

    Object 11 (text_1)
    - Object 8
    - Object 10
    
    Object 12 (text_2)
    

    Objects 8 and 10 have no name: it means that they are different representations of the same actual field (which is their parent). In that case, the flags must be stored at the parent level. The code that you showed doesn't change the parent object (only the leaves), which is why it doesn't work.

    You need more low-level code to properly change the fields to read-only. Here is a way to do it:

    import pymupdf
    from pymupdf import mupdf
    
    def set_readonly_field(field_obj):
        flags = mupdf.pdf_dict_get_int(field_obj, pymupdf.PDF_NAME("Ff"))
        flags |= pymupdf.PDF_FIELD_IS_READ_ONLY
        mupdf.pdf_dict_put_int(field_obj, pymupdf.PDF_NAME("Ff"), flags)
    
    doc = pymupdf.open("multi_form_field.pdf") 
    for page in doc:
        for field in page.widgets():
            field_obj = mupdf.pdf_annot_obj(field._annot)
            print("Processing object %d" % field_obj.pdf_to_num())
            name = mupdf.pdf_dict_get_string(field_obj, pymupdf.PDF_NAME("T"))
            if name[0] == "":
                # This object has no name, so we're looking at its parent
                parent_obj = mupdf.pdf_dict_get(field_obj, pymupdf.PDF_NAME("Parent"))
                if parent_obj.pdf_to_num() != 0:
                    print("Found parent %d" % parent_obj.pdf_to_num())
                    set_readonly_field(parent_obj)
            else:
                set_readonly_field(field_obj)
    doc.save("output.pdf")
    

    Output:

    Processing object 8
    Found parent 11
    Processing object 10
    Found parent 11
    Processing object 12
    

    The original PDF contains:

    11 0 obj
    <</FT/Tx/Ff 0/Kids[8 0 R 10 0 R]/T(text_1)/V(text 1)>>
    endobj
    

    The generated PDF contains:

    11 0 obj
    <</FT/Tx/Ff 1/Kids[8 0 R 10 0 R]/T(text_1)/V(text 1)>>
    endobj
    

    Note that the /Ff entry (which stores the flags) changed from 0 to 1 for object 11, which means that field text_1 is now read-only.