javapdfboxapache-tikajai

How to fix "Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not installed"


I am setting up a java project where I use pdfBox to get images out of PDF. Since I am using tika-app for my other functions, I decided to go with pdfBox present inside tika-app-1.20.jar.

I have tried including the jai-imageio-core-1.3.1.jar before,since Tika-app already comes bundled with this jar. I tried with tika-app jar alone.

The line that's throwing error

PDXObject object = resources.getXObject(cosName);

the log trace of the error:

org.apache.pdfbox.filter.MissingImageReaderException: Cannot read JPEG2000 image: Java Advanced Imaging (JAI) Image I/O Tools are not installed
    at org.apache.pdfbox.filter.Filter.findImageReader(Filter.java:163)
    at org.apache.pdfbox.filter.JPXFilter.readJPX(JPXFilter.java:115)
    at org.apache.pdfbox.filter.JPXFilter.decode(JPXFilter.java:64)
    at org.apache.pdfbox.cos.COSInputStream.create(COSInputStream.java:77)
    at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:175)
    at org.apache.pdfbox.cos.COSStream.createInputStream(COSStream.java:163)
    at org.apache.pdfbox.pdmodel.common.PDStream.createInputStream(PDStream.java:236)
    at org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject.<init>(PDImageXObject.java:140)
    at org.apache.pdfbox.pdmodel.graphics.PDXObject.createXObject(PDXObject.java:70)
    at org.apache.pdfbox.pdmodel.PDResources.getXObject(PDResources.java:426)

But I am quite sure I have jai-imageio-core in tika which turns out to be invisible when I run the code.


Solution

  • Actually, I stumbled upon this error as well but this is mentionned in the PDFBox documentation here. You need to add the following dependencies to your pom.xml :

    <dependency>
        <groupId>com.github.jai-imageio</groupId>
        <artifactId>jai-imageio-core</artifactId>
        <version>1.4.0</version>
    </dependency>
    
    <dependency>
        <groupId>com.github.jai-imageio</groupId>
        <artifactId>jai-imageio-jpeg2000</artifactId>
        <version>1.3.0</version>
    </dependency>
    
    <!-- Optional for you ; just to avoid the same error with JBIG2 images -->
    <dependency>
        <groupId>org.apache.pdfbox</groupId>
        <artifactId>jbig2-imageio</artifactId>
        <version>3.0.3</version>
    </dependency>
    

    If you are using Gradle :

    dependencies {
        implementation 'com.github.jai-imageio:jai-imageio-core:1.4.0'
        implementation 'com.github.jai-imageio:jai-imageio-jpeg2000:1.3.0'
    
        // Optional for you ; just to avoid the same error with JBIG2 images
        implementation 'org.apache.pdfbox:jbig2-imageio:3.0.3'
    }