I have a MRC compressed PDF (images are JPX encoded) which I can not get redacted with iText 7 pdfSweep as the ImageReadException is thrown.
Caused by: org.apache.commons.imaging.ImageReadException: Can't parse this format.
at org.apache.commons.imaging.Imaging.getImageParser(Imaging.java:731)
at org.apache.commons.imaging.Imaging.getImageInfo(Imaging.java:703)
at org.apache.commons.imaging.Imaging.getImageInfo(Imaging.java:637)
at com.itextpdf.pdfcleanup.PdfCleanUpFilter.processImage(PdfCleanUpFilter.java:343)
... 13 more
Do you know any workaround or solution for this issue? An obvious workaround would be to replace the jp2 (jpx) in the PDF with some other image format and perform the redaction on this modified PDF, however, in this case the benefits of MRC compression are lost, not to mention the overall speed of such conversion and then redaction.
(iText developer here)
As you can see, iText uses org.apache.commons
to handle the images.
In the past we have had some problems with known bugs in this external library.
A possible solution is to fork the org.apache.commons
project, implement a fix, and submit your pull request.
This way, everyone benefits, and the change would automatically be included in iText
as well.
Of course, should you be a paying customer, then reporting this problem through the iText support board might trigger us to do the pull request instead.
As for a workaround, I think you've already suggest the appropriate idea.
pdfSweep
More detailed (step 1 and 2)
using IEventListener you can obtain the underlying BufferedImage
of a given resource, and you can then use a ByteArrayOutputStream
and ImageIO
to re-encode your image into standard jpg or png. You can then use iText
to change the dictionary entry for this particular resource.