I want get result of match which is in format of image . Below is the code I'm using to read text from image .I have used python code it also gives same result. How can i improve the output or is there any other better way for my problem .
public String getImgText(String imageLocation) {
ITesseract instance = new Tesseract();
try
{
instance.setDatapath("/tessdata");
instance.setLanguage("eng");
String imgText = instance.doOCR(new File(imageLocation));
return imgText;
}
catch (TesseractException e)
{
e.getMessage();
return "Error while reading image";
}
}
output is totally different of input
unnl lE
mam-m m,
mun-m, 1 ms "mm M
W urn-mm my A mm“ m
mus-1mm 1 m- m m
mfinlln um: ”mu“ m
ilk-M m.
mwnm mu 5 mm nu-
..mn. n w. tvhrzmr- m
2 rm.“- 0 w, mama: m.
mum-mp 5 mu mum n.
a bulb-h» m
tum-3mm nun mm,” M
3 mmn m; mum“ M
Ema W 7 a“. m
mzsm 5m mm»... m
Continue
input image is
You should preprocess the image before running Tesseract (python code with opencv library):
import cv2
img = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
result = cv2.bitwise_not(img)
result[result >= 190] = 255
# To show the image
cv2.imshow("Threshold", result)
cv2.waitKey()
Resulting in something like this:
Additionally it seems the English traineddata handles the PUBG font poorly so you might wanna look into finetuning it: Training eng.traineddata for PUBG font