javaswingtesseracttess4j

Why can't I catch the TesseractException?


I am using Tess4j for using Tesseract-OCR technology and I have been using the following code:

Code sample

During testing I wanted to test the catch close so I was feeding wrong information to Tesseract, which should result in TesseractException. I managed to induce a TesseractException from the createDocuments() method. Here is the stack trace: Console Output

Note that in the exception we can find doOcr()'s line 125, which is within the try-catch clause, but even though console shows a TesseractException being thrown, the code moves onto line 126 returning true.

I use net.sourceforge.tess4j.Tesseract to initiate the OCR proccess, but I tried net.sourceforge.tess4j.Tesseract1 too, which resulted the same red console output that is done by Tess4j, but no TesseractException.

My question is what am I doing wrong? I am just assuming there is an issue with my code, because TesseractExceptionis being thrown, but my code is not catching it.


Solution

  • Look at the source code of Tesseract.java:

    @Override
    public void createDocuments(String[] filenames, String[] outputbases, List<RenderedFormat> formats) throws TesseractException {
        if (filenames.length != outputbases.length) {
            throw new RuntimeException("The two arrays must match in length.");
        }
    
        init();
        setTessVariables();
    
        try {
            for (int i = 0; i < filenames.length; i++) {
                File workingTiffFile = null;
                try {
                    String filename = filenames[i];
    
                    // if PDF, convert to multi-page TIFF
                    if (filename.toLowerCase().endsWith(".pdf")) {
                        workingTiffFile = PdfUtilities.convertPdf2Tiff(new File(filename));
                        filename = workingTiffFile.getPath();
                    }
    
                    TessResultRenderer renderer = createRenderers(outputbases[i], formats);
                    createDocuments(filename, renderer);
                    api.TessDeleteResultRenderer(renderer);
                } catch (Exception e) {
                    // skip the problematic image file
                    logger.error(e.getMessage(), e);
                } finally {
                    if (workingTiffFile != null && workingTiffFile.exists()) {
                        workingTiffFile.delete();
                    }
                }
            }
        } finally {
            dispose();
        }
    }
    
    /**
     * Creates documents.
     *
     * @param filename input file
     * @param renderer renderer
     * @throws TesseractException
     */
    private void createDocuments(String filename, TessResultRenderer renderer) throws TesseractException {
        api.TessBaseAPISetInputName(handle, filename); //for reading a UNLV zone file
        int result = api.TessBaseAPIProcessPages(handle, filename, null, 0, renderer);
    
        if (result == ITessAPI.FALSE) {
            throw new TesseractException("Error during processing page.");
        }
    }
    

    Exception is thrown at line 579. This method is called by a public method above - at line 551. This is inside the try-catch block with logger.error(e.getMessage(), e); in the catch body (line 555).

    Now the question is what you really want to achieve?

    If you don't want to see this log, you can configure slf4j to not print the log from this library.

    If you want to get the actual exception, it is not possible as the library swallows it. I am not familiar with the library, but looking at the code it doesn't seem like there is any nice option - the method that throws the exception is private and is used only in this one place - under the try-catch block. However, the exception is thrown when api.TessBaseAPIProcessPages(...) returns ITessAPI.FALSE and api has a getter. So you could get it, call TessBaseAPIProcessPages(...) method and check for the result. This might be not ideal as you will probably be processing every image twice. Another solution is to fork the source code and modify it yourself. You might also want to contact the author and ask for advice - you could take it further and submit a pull request for them to approve and release.