I have an issue with digitally signing PDF documents that have been marked as PDF/A - 3A compliant. With PDFBox (latest version, 2.0.24) I get in the end an invalid signature in Adobe Acrobat, while with iText7 (latest version) I get a valid signature. The goal is to get PAdES LTV compliant signatures.
My process is the following (with both PDFBox and iText7):
For PDFBox, the code for signing is here and for OCSP/CRL embedding is here. For iText7, the code for signing and for OCSP/CRL embedding is here.
Now, this works OK for most PDF files, including multi-signature documents. The problem is with one particular PDF, that is created as PDF/A compliat, level 3A.
With PDFBox, if I just embed the signature and open the document in Adobe Acrobet, the signature is valid. If I also embed the OCSP/CRL content, the signature is no longer valid. Adobe Acrobat complains that:
Signature is invalid: Document has been altered or corrupted since it was signed.
I also noticed that just by doing:
document.load(inputStream);
document.save(outputStream);
I break the signature. From my tests, the actual embedding is not really the cause of the issue, but just the fact that I reopen the PDF after embedding the signature and save it back to disk.
With the same process (keys, certificate, etc) via iText7 I get a valid LTV signature in the end, in Adobe Acrobat.
The sample documents are here. The original contains the unsigned document, and then there are 2 samples, one for PDFBox (invalid in Adobe Acrobat) and one for iText7 (valid in Adobe Acrobat).
My research so far shows that somehow PDFBox is breaking the order of the elements when loading the PDF after signature embedding. It hints at this issue with loading and saving documents, though for ALL the other PDFs I do the same process and Adobe Acrobat does not complain about the signature.
I also tried with PDFBox 2.1.0-SNAPSHOT and 3.0.0-SNAPSHOT, hoping that the issue is related to ordering of elements in PDF and it was fixed. Still, I get the same results.
Please see the Later edit 2 below, this Later edit 1 here is not a good idea!
As per the accepted answer below from @mkl, the issue is with the original PDF file, which contains the cross reference table split into several subsections instead of one. This seems to be caused by the library (Aspose PDF for .NET, version 21.3 or earlier) used by the service that generated the PDF in the first place.
One workaround that seems to work with my current code is the following:
PDDocumentInformation info = pdDocument.getDocumentInformation();
if (info != null && StringUtils.containsIgnoreCase(info.getProducer(), "Aspose")) {
try {
pdDocument.save(inMemoryStream);
pdDocument.close();
pdDocument = PDDocument.load(inMemoryStream.toByteArray());
inMemoryStream.reset();
} catch (Exception e) {
Basically if I detect that the producer of the document is Aspose, I save the document in memory (via PDFBox' pdDocument.save()) and load it back. This ensures the cross reference table is written correctly in memory and from there the signing and OCSP+CRL embedding works as expected, yielding a valid signature in Adobe Acrobat.
Thank you @mkl and @TilmanHausherr, you are right. It is not a good idea to assume that all documents produced with a certain library have to automatically be normalized, as existing signatures will be invalidated. In the end, the better idea is to keep the code as it was and expect a properly constructed PDF. Fix the problem where it is created.
The problem is caused by an error in the original PDF. Your PDFBox code signs in append mode (i.e. in an incremental update), so that error is present in the signed version, too. Your iText code does not sign in append mode but instead re-writes the whole PDF; while doing so it does not make the same error as the producer of your original PDF, so the error is not in the signed version anymore. Adobe Acrobat is very sensitive to such issues when validating signatures with updates.
The cross reference table of the initial revision in a PDF must not be split into separate subsections but in case of your original PDF it has been split:
0 75
0000000000 65535 f
0000000018 00000 n
...
0000313374 00000 n
0000313397 00000 n
76 20
0000313419 00000 n
0000313443 00000 n
...
0000846048 00000 n
0000846175 00000 n
Similar cases have been discussed in this answer, this answer, this answer, and elsewhere; you can also find some specification references in those answers.
Usually this goes unnoticed, Adobe Acrobat is usually quite lax when encountering small issues in PDFs.
Usually, that is, except when validating documents with integrated signatures and incremental updates after the signed revision, in that situation Adobe Acrobat often considers such issues suspect and fails validation of the signature, even though it doesn't complain when validating the same PDF without the incremental updates after the signed revision.
You are in that critical situation, your final document contains an incremental update after the signed revision, an update with validation related information.
According to the Info dictionary of your original PDF it has been produced by "Aspose.PDF for .NET 21.3.0". Earlier version of Aspose.PDF are known to create such faulty cross reference tables (see section "The PDF processor that damages the PDF" of the first answer referenced above). Apparently Aspose have not yet gotten around to fix this issue for good.