pdftesseractghostscripttess4j

Tess4J - Native library (linux-x86-64/libtesseract.so) not found in resource path


I'm using Tess4J (JNA wrapper around tesseract), and trying to call tess.doOCR(myFile) to OCR text from a single-page PDF.

I have GhostScript installed (by using yum install ghostscript), gs -h works correctly.

My app server is using 64-bit JVM, and I have gsdll64.dll, and the 64-bit tesseract dll's liblept168.dll and libtesseract302.dll in the class path.

When tess.doOCR(myFile) is called, this is logged:

GPL Ghostscript 8.70 (2014-09-22)
Copyright (C) 2014 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1

But then it just stops there. The program doesn't go any further.

UPDATE --

It looks like the real issue is from this error:

java.lang.UnsatisfiedLinkError: Unable to load library 'tesseract': Native library (linux-x86-64/libtesseract.so) not found in resource path

After looking around a lot, I don't see a convenient place to find this libtesseract.so file, and I'm not sure what it takes to get this onto my Linux app server. I read that maybe I need to download some C++ runtime, but I don't see a Linux download for that. Any advice would be much appreciated.

Or is this something to do with a symbolic link?


Solution

  • The Fix was simple for me,just do sudo apt-get install tesseract-ocr from the command line. For linux you dont need to worry about the DDL librarires or the jvm version. Installing tessearct from apt-get will do the trick.