javapdf-generationbufferedimagetess4jleptonica

It is possible to use the TessAPI1.TessPDFRendererCreate API of tess4J without needing to create physical files?


I am using the Tesseract Java API (tess4J) to convert Tiff images to PDFs.

This works nicely, but I am forced to write both the source Tiff image and the output PDF to local filestore as actual physical files in order to use the TessAPI1.TessPDFRendererCreate API.

Please note the following in the code snippet below: -

  1. The input Tiff is originally a java.awt.image.BufferedImage, but I have to write it to a physical file (sourceTiffFile is a File object).

  2. I must specify a file path for the output (pdfFullFilepath is a String representing an absolute path for the new PDF file).

        try {
            ImageIO.write(bufferedImage, "tiff", sourceTiffFile);
        } catch (Exception ioe) {
            //handling code...
        }
    
        TessResultRenderer renderer = TessAPI1.TessPDFRendererCreate(pdfFullFilepath, dataPath, 0);
        TessAPI1.TessResultRendererInsert(renderer, TessAPI1.TessPDFRendererCreate(pdfFullFilepath, dataPath, 0));
        int result = TessAPI1.TessBaseAPIProcessPages(handle, sourceTiffFile.getAbsolutePath(), null, 0, renderer);
    

I would really like to avoid creating physical files, but am not sure if it is possible with this API. Ideally, I would like to pass the Tiff as a java.awt.image.BufferedImage or a byte array and receive the output PDF as a byte array.

Any suggestions would be most welcome as always. Thank you :)


Solution

  • You can pass in ProcessPage API method a Pix, which can be converted from a BufferedImage, but the output will still be a physical file. Tesseract API dictates that.

    https://tesseract-ocr.github.io/tessapi/4.0.0/a01625.html

    http://tess4j.sourceforge.net/docs/docs-4.4/net/sourceforge/tess4j/TessAPI1.html

    For ex:

    int result = TessAPI1.TessBaseAPIProcessPage(handle, LeptUtils.convertImageToPix(bufferedImage), page_index, "input file name", null, 0, renderer);