I am using Tess-Two for creating an OCR for Android. I used the same image for conversion, but the result is very different from the tesseract for desktop.
The desktop version of tesseract gives a better result.
I am using the following lines on Android:
val baseApi = TessBaseAPI()
baseApi.init(dirPath, "eng")
baseApi.setImage(mustOpen)
val recognizedText = baseApi.utF8Text
And on desktop, I am using just this simple command
tesseract image.png result
The sample image is:
The output for the image using tesseract for Desktop is:
VEGETABLE OF, RIVET een Sra) SUGAR, EDIBLE
VEGETABLE OIL, INVERT SUGAR S' SUGAR, CITRIC
RAISING 503 (ii), BAKING }, SALT,
SOLIDS (0.6 % [ DL-ACETYL TARTARIC
ACID ESTERS OF ‘AND
And, the output using tess-two for android is this:
'm mm W7 ' ' iii-E:
mmmmfiwgmb Ian»: came
a” ( om | mmmfiéu
mmormuguomws _
Won mm .. . . ml
mumm I'm‘n
( .
Which is very gibberish. Please help.
So as I commented on your post and just solved it for me, I thought I share.
The first problem for me was that the image needs to be preprocessed for better results. I'm using OpenCV for the preprocessing. Here https://android.jlelse.eu/a-beginners-guide-to-setting-up-opencv-android-library-on-android-studio-19794e220f3c is a good example how to set it up.
Then the image needs to be switched into a binary image. For me the following gives best results
Mat plateMat = Utils.loadResource(this,R.drawable.plate);
Mat gray = new Mat();
Imgproc.cvtColor(plateMat,gray,Imgproc.COLOR_BGR2GRAY);
Mat blur = new Mat();
Imgproc.GaussianBlur(gray,blur,new Size(3,3),0);
Mat thresh = new Mat();
Imgproc.adaptiveThreshold(blur,thresh,255, Imgproc.ADAPTIVE_THRESH_MEAN_C,Imgproc.THRESH_BINARY_INV,75,10);
Core.bitwise_not(thresh,thresh);
Bitmap bmp = Bitmap.createBitmap(thresh.width(),thresh.height(),Bitmap.Config.ARGB_8888);
Utils.matToBitmap(thresh,bmp);
Then I call Tesseract using the eng+osd language (in this order) you can find them here: https://github.com/tesseract-ocr/tessdata
Then by using tesseract I do this:
TessBaseAPI tesseract = new TessBaseAPI();
tesseract.setDebug(true);
tesseract.init(getFilesDir().getAbsolutePath(),"eng+osd");
tesseract.setImage(bmp);
String utf8 = tesseract.getUTF8Text();
NOW THE REAL DEAL
The real problem why I got a different result in the end is simply because the tesseract version installed with Homebrew on my Mac was 4.1.0 meanwhile the official Tess-two repo still uses 3.05 By digging through the repos issues I found that the developer of Tess two has a new version with Tesseract 4 but it needed to be in a different repo. It is here https://github.com/adaptech-cz/Tesseract4Android
Once I cloned it and used the extracted aar from the project, the results were the same and I can finally sleep in peace!