itextredaction

Attempt to apply redactions results in exception


I have followed the steps to create annotations and apply redaction using iText 5.5.9. Here is my code:

using (var stamper = new PdfStamper(pdfReader, new FileStream(newFilePath, FileMode.Create)))
{
    // Redact the values.
    var pdfAnot1 = new PdfAnnotation(stamper.Writer, new Rectangle(165f, 685f, 320f, 702f));
    pdfAnot1.Title = "First Page";
    pdfAnot1.Put(PdfName.SUBTYPE, PdfName.REDACT);
    pdfAnot1.Put(PdfName.IC, new PdfArray(new[] { 0f, 0f, 0f }));
    pdfAnot1.Put(PdfName.OC, new PdfArray(new[] { 1f, 0f, 0f })); // red outline
    stamper.AddAnnotation(pdfAnot1, 1);
    for (var i = 1; i <= pdfReader.NumberOfPages; i++)
    {
        var pdfAnot2 = new PdfAnnotation(stamper.Writer, new Rectangle(220f, 752f, 420f, 768f));
        pdfAnot2.Title = "Header";
        pdfAnot2.Put(PdfName.SUBTYPE, PdfName.REDACT);
        pdfAnot2.Put(PdfName.IC, new PdfArray(new[] { 0f, 0f, 0f }));
        pdfAnot2.Put(PdfName.OC, new PdfArray(new[] { 1f, 0f, 0f })); // red outline
        stamper.AddAnnotation(pdfAnot2, i);
    }

    var cleaner = new PdfCleanUpProcessor(stamper);
    cleaner.CleanUp();
}

However, I always receive the following exception on PdfCleanUpProcessor construction:

Object reference not set to an instance of an object. at iTextSharp.xtra.iTextSharp.text.pdf.pdfcleanup.PdfCleanUpProcessor.ExtractLocationsFromRedactAnnots(Int32 page, PdfDictionary pageDict) at iTextSharp.xtra.iTextSharp.text.pdf.pdfcleanup.PdfCleanUpProcessor.ExtractLocationsFromRedactAnnots() at iTextSharp.xtra.iTextSharp.text.pdf.pdfcleanup.PdfCleanUpProcessor..ctor(PdfStamper pdfStamper)

It appears that there is a null reference produced in extractLocationsFromRedactAnnots on the assignment of annotDict, so the next line throws the exception:

    /**
     * Extracts locations from the redact annotations contained in the document and applied to the given page.
     */
    private IList<PdfCleanUpLocation> ExtractLocationsFromRedactAnnots(int page, PdfDictionary pageDict) {
        List<PdfCleanUpLocation> locations = new List<PdfCleanUpLocation>();

        if (pageDict.Contains(PdfName.ANNOTS)) {
            PdfArray annotsArray = pageDict.GetAsArray(PdfName.ANNOTS);

            for (int i = 0; i < annotsArray.Size; ++i) {
                PdfIndirectReference annotIndirRef = annotsArray.GetAsIndirectObject(i);
                PdfDictionary annotDict = annotsArray.GetAsDict(i);
                PdfName annotSubtype = annotDict.GetAsName(PdfName.SUBTYPE);

                if (annotSubtype.Equals(PdfName.REDACT)) {
                    SaveRedactAnnotIndirRef(page, annotIndirRef.ToString());
                    locations.AddRange(ExtractLocationsFromRedactAnnot(page, i, annotDict));
                }
            }
        }

        return locations;
    }

Any idea why this is happening? An example PDF is here.


Solution

  • There are two issues at work here, one being in the OP's code and one in iText(Sharp).

    Issue in the OP's code

    One has to be aware that the architecture of the PdfReader/PdfStamper pair is not that of a document in memory that is manipulated only to be saved in the end. Instead manipulations by the stamper usually are written to the output stream as soon as possible and are not necessarily visible to other code working on the stamper.

    The rationale is that the iText architecture (as wild as it may seem in versions before 7.x) is built to allow operations with a low resource footprint. In server applications which may have to process many PDFs in parallel this is very important.

    In the case at hand the OP's code first adds Redact annotations and in the same run tries to cleanup using these annotations. This does not work. Instead the OP either should add the annotations in one pass and apply cleanup in a second, i.e.

    using (PdfReader pdfReader = new PdfReader(source))
    using (var stamper = new PdfStamper(pdfReader, new FileStream(temp, FileMode.Create)))
    {
        // ... add REDACT annotations
    }
    
    using (PdfReader pdfReader = new PdfReader(temp))
    using (var stamper = new PdfStamper(pdfReader, new FileStream(dest, FileMode.Create)))
    {
        var cleaner = new PdfCleanUpProcessor(stamper);
        cleaner.CleanUp();
    }
    

    or not use Redact annotations at all: After all, why add annotations only to immediately remove them again. For this PdfCleanUpProcessor has a second constructor which is given the cleanup locations directly:

    /**
     * Creates a {@link com.itextpdf.text.pdf.pdfcleanup.PdfCleanUpProcessor} object based on the
     * given {@link java.util.List} of {@link com.itextpdf.text.pdf.pdfcleanup.PdfCleanUpLocation}s
     * representing regions to be erased from the document.
     *
     * @param pdfCleanUpLocations list of locations to be cleaned up {@see PdfCleanUpLocation}
     * @param pdfStamper          A{@link com.itextpdf.text.pdf.PdfStamper} object representing the document which redaction
     *                            applies to.
     */
    public PdfCleanUpProcessor(IList<PdfCleanUpLocation> pdfCleanUpLocations, PdfStamper pdfStamper)
    

    Issue in iText(Sharp)

    The PdfCleanUpProcessor has a member dictionary clippingRects to which Redact annotation areas are added by their index in their page Annots array:

    private IList<PdfCleanUpLocation> ExtractLocationsFromRedactAnnot(int page, int annotIndex, PdfDictionary annotDict) {
        ...
        clippingRects.Add(annotIndex, markedRectangles); 
        ...
    }
    

    If a document on multiple pages has Redact annotations with the same index in their respective page Annots array, therefore, this method in different calls tries to add multiple entries to the member clippingRects using the same key. The .Net Dictionary class does not allow this and throws an exception.

    Thus, iTextSharp redaction by Redact annotations only works properly for a document with only Redact annotations if there only one page is so annotated!

    The original development of this feature takes place in Java, and in Java clippingRects is a HashMap which allows overwriting entries, so no exceptions are thrown here. Furthermore, as the contents of clippingRects are used only in a special case (the use of RO or OverlayText in the Redact entries), the wrong entries often don't do any harm and, therefore, may not yet have been reproducibly observed.