I'm using Tess4J (JNA wrapper around tesseract), and trying to call tess.doOCR(myFile)
to OCR text from a single-page PDF.
I have GhostScript installed (by using yum install ghostscript
), gs -h
works correctly.
My app server is using 64-bit JVM
, and I have gsdll64.dll
, and the 64-bit tesseract dll's liblept168.dll
and libtesseract302.dll
in the class path.
When tess.doOCR(myFile)
is called, this is logged:
GPL Ghostscript 8.70 (2014-09-22)
Copyright (C) 2014 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1
But then it just stops there. The program doesn't go any further.
UPDATE --
It looks like the real issue is from this error:
java.lang.UnsatisfiedLinkError: Unable to load library 'tesseract': Native library (linux-x86-64/libtesseract.so) not found in resource path
After looking around a lot, I don't see a convenient place to find this libtesseract.so
file, and I'm not sure what it takes to get this onto my Linux app server. I read that maybe I need to download some C++ runtime, but I don't see a Linux download for that. Any advice would be much appreciated.
Or is this something to do with a symbolic link?
The Fix was simple for me,just do sudo apt-get install tesseract-ocr from the command line. For linux you dont need to worry about the DDL librarires or the jvm version. Installing tessearct from apt-get will do the trick.