pdfdigital-signatureitext7

How to interpret signatureCoversWholeDocument() == false?


When trying to validate a certain PDF's signature, I used the following code.

Validation results look good, apart from one thing: SignatureUtil#signatureCoversWholeDocument returns false.

It's obvious what that means. But I'm not sure about how to interpret this.

How can I determine which parts of the document aren't covered by the signature?

Can some evil guy change the document's content (if it's uncovered) while still keeping a valid signature?

In other words: how can I assure that this ain't nothing to be worried about?


Solution

  • You say that it's obvious what it means that SignatureUtil#signatureCoversWholeDocument returns false but just to be sure, first some backgrounds.

    What Does It Mean When a PDF Signature Does Not Cover the Whole Document

    At the moment they are applied, PDF signatures cover their respective whole document (except, of course, the embedded signature container itself, or more exactly a placeholder for it which might be a bit larger):

    sketch

    The ranges of signed bytes (from the file start to the placeholder start and from after the placeholder end to the file end) are specified in the signed part of the PDF.

    Now the PDF format allows adding to a PDF document not only by re-building the whole document from scratch but alternatively also by adding changes after its end in so called incremental updates.

    As the byte ranges signed by a PDF signature are specified in the document, this mechanism can even be used to add changes to a signed PDF without cryptographically breaking the signature.

    This mechanism can be used for example to apply multiple PDF signatures to a document:

    sketch

    But the mechanism can also be used for a myriad other kinds of changes.

    Allowed and Disallowed Changes

    If one can add arbitrary changes to a signed PDF without breaking the signature cryptographically, one may wonder what the value of the signature is to start with.

    Of course one can always extract and display/process the PDF revision a PDF signature covers (simply take the partial document from its start to the end of the second signed byte range). Thus, it is clear what the original PDF fully covered by the signature looked like. So a signed PDF can be considered a collection of logical documents, loosely based on one another: for each signature the document covered by it plus (if there are additional unsigned additions) the full document.

    There actually are use cases where this makes sense, for example a document being created by a number of authors each signing of their respectively edited document version.

    But the number of use cases in which that view is too diffuse is larger (or at least more important) still. In particular there are numerous use cases with multiple signatures in which one wants a PDF to represent a single logical document signed by multiple persons, at most with a few extra form fill-ins after the first signature.

    To support such use cases the PDF specification defines a number of sets of allowed changes. Such a set can be selected by the first signature of the document. For details on these sets of allowed changes see this answer. In particular such allowed changes may encompass

    Determining the Changes in a PDF and Checking Whether They Are Allowed in Practice

    In the light of the previous section the question of the OP burns down to how one can determine the nature of the changes in incremental updates and how one can determine whether they are allowed or disallowed.

    Determining which low-level objects in a PDF have changed actually is not that difficult to determine, see the PdfCompare and PdfRevisionCompare classes in this answer.

    The real problem is checking whether these changes in low-level objects can be considered to only serve an allowed change as specified (or do not change the document semantically at all)!

    Here even the "gold standard" (i.e. Adobe Acrobat) errs again and again, both in failing to recognize disallowed changes (see e.g. the "Attacks on PDF Certification" on pdf-insecurity.org for examples that meanwhile have been fixed) and in failing to recognize allowed changes (see e.g. here and here).

    Essentially this is a very difficult task. And it is very unlikely you will find a good implementation in some open source project.

    In particular iText 7 does not include such an analysis. If you want it, therefore, you'll have to implement it yourself.

    You can simplify the task a bit if you expect the changes to be applied by a specific software. In that case you can analyze how (in terms of low-level objects) that software applies allowed changes and only accept such low-level changes. For example Adobe Acrobat is really good at recognizing allowed changes applied by Adobe software to PDFs created by Adobe software.