From time to time we want to remove the write-protection/"encryption" and digital signatures of our PDF documents, so the document can be changed and re-signed. E.g. because the original document is missing or was changed and the digital signatures became corrupt (E.g. this document).
For this, we used the following iText 8 code (Indeed, flattening the AcroForm is not the best way, e.g. because interactive forms become disabled etc.):
public static byte[] cleanUpPdfItext(byte[] originalPdfData) throws Exception {
// Read the PDF document
try (
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
PdfReader pdfReader = new PdfReader(new ByteArrayInputStream(originalPdfData)).setUnethicalReading(true);
PdfWriter pdfWriter = new PdfWriter(byteArrayOutputStream);
PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter)
) {
// Create the signature utils
SignatureUtil signatureUtil = new SignatureUtil(pdfDocument);
// Check if encrypted and/or contains signatures
boolean isEncrypted = pdfReader.isEncrypted();
boolean hasSignatures = !signatureUtil.getSignatureNames().isEmpty();
// Handle all cases
if (isEncrypted && hasSignatures) { // Encrypted and signatures
// Remove the signatures
PdfAcroForm form = PdfAcroForm.getAcroForm(pdfDocument, true);
form.flattenFields();
// Write the changes to the output stream, so we can read them
pdfDocument.close();
// Get the manipulated document
return byteArrayOutputStream.toByteArray();
} else if (isEncrypted) { // Encrypted but no signatures
// Write the changes to the output stream, so we can read them
pdfDocument.close();
// Get the manipulated document
return byteArrayOutputStream.toByteArray();
} else { // Not encrypted/no signatures
// Return the original document data
return originalPdfData;
}
}
}
Question: What is the equivalent code to do this with pdfbox? Remove the write-protection/"encryption" and remove all existing signatures (Missing yet), so the document can be edited and resigned?
I came up with this initial version:
public static byte[] cleanUpPdfbox(byte[] original) throws Exception {
try (
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
PDDocument pdDocument = Loader.loadPDF(original)
) {
// Check if encrypted or read only
AccessPermission accessPermission = pdDocument.getCurrentAccessPermission();
boolean isEncrypted = pdDocument.isEncrypted() || accessPermission.isReadOnly();
// Check if signatures exist
boolean hasSignatures = false;
PDAcroForm pdAcroForm = pdDocument.getDocumentCatalog().getAcroForm();
if (pdAcroForm != null) {
// Get a list of all signature fields
List<PDSignature> pdSignatures = pdDocument.getSignatureDictionaries();
hasSignatures = !pdSignatures.isEmpty();
}
// Remove all security if required
if (isEncrypted) {
pdDocument.setAllSecurityToBeRemoved(true);
}
// Remove all signatures
if (hasSignatures) {
// TODO: Code in question
}
// Write the document
pdDocument.save(byteArrayOutputStream);
return byteArrayOutputStream.toByteArray();
}
}
If there are revisions you could cut off after the second last %%EOF. However this file doesn't use revisions. This solution removes the signature field from the fields array (in the hope that it's on the top level) and also removes it from the annotation array on the page. And removes the Perms entry from the document catalog.
try (PDDocument doc = Loader.loadPDF(new File("Encrypted.pdf")))
{
PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();
List<PDField> oldFieldList = acroForm.getFields();
List<PDField> newFieldList = new ArrayList<>();
for (PDField field : oldFieldList)
{
if (!(field instanceof PDSignatureField))
{
newFieldList.add(field);
}
}
acroForm.setFields(newFieldList);
for (PDPage page : doc.getPages())
{
List<PDAnnotation> oldAnnotationList = page.getAnnotations();
List<PDAnnotation> newAnnotationList = new ArrayList<>();
for (PDAnnotation ann : oldAnnotationList)
{
if (ann instanceof PDAnnotationWidget && ann.getCOSObject().containsKey(COSName.V))
{
continue;
}
newAnnotationList.add(ann);
}
page.setAnnotations(newAnnotationList);
}
doc.setAllSecurityToBeRemoved(true);
doc.getDocumentCatalog().getCOSObject().removeItem(COSName.PERMS);
doc.save(new File("SO79055588-saved.pdf"));
}
It might be possible that a signature is below the top level, although I can't remember having ever seen this. If you want to handle this, check whether a field is of type PDNonTerminalField
and call getChildren()
and then do the same for-loop as with the top level, and do this recursively.
(Update) Alternative solution that I originally made first (see mkl comment), signatures are deactivated but will still appear in the list:
try (PDDocument doc = Loader.loadPDF(new File("Encrypted.pdf")))
{
PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();
for (PDField field : acroForm.getFieldTree())
{
if (field instanceof PDSignatureField)
{
((PDSignatureField) field).setValue((PDSignature) null);
((PDSignatureField) field).getWidgets().get(0).setAppearance(null);
((PDSignatureField) field).getWidgets().get(0).setRectangle(new PDRectangle());
}
}
doc.setAllSecurityToBeRemoved(true);
doc.getDocumentCatalog().getCOSObject().removeItem(COSName.PERMS);
doc.save(new File("SO79055588-saved.pdf"));
}