I am looking to extract text from this image:
Specifically the row under "Kills". However I cannot seem to get accurate results.
I tried to convert the image to gray and apply a threshhold:
import { createWorker, OEM, PSM } from "tesseract.js";
import cv from "@u4/opencv4nodejs";
import fs from "node:fs/promises";
const worker = await createWorker("eng", OEM.TESSERACT_LSTM_COMBINED);
await options.worker.setParameters({
tessedit_char_whitelist: "0123456789",
tessedit_pageseg_mode: PSM.SINGLE_BLOCK,
});
const image = await cv.imdecodeAsync(
await fs.readFile("input.png"),
cv.COLOR_BGR2GRAY
);
const threshHoldedImage =
await image.thresholdAsync(
150,
255,
cv.THRESH_BINARY
);
const blurredImage = await cv.imencodeAsync(".png", threshHoldedImage);
const {
data: { text: tierKillsText },
} = await options.worker.recognize(blurredImage, {
rectangle: {
top: 265,
left: 552,
width: 87,
height: 138,
},
});
console.log(tierKillsText);
// Received: 3228387
// Expected: 3328387
I have also tried to apply a gaussian blur without success:
const sigma = 0.75;
const blurred = threshHoldedImage.gaussianBlur(new cv.Size(0, 0), sigma);
I fixed it by reading out each line individually which seems to lead to more accurate results