I am trying to get the x and y coordinates of specific text on an image like this. On this image I am trying to detect where X:input Y:input
is located which could be anywhere on future images. In this case I would expect it to be around 714, 164, 125, 32
(x, y, width height).
I tried to use Tesseract and Jimp
const worker = await Tesseract.createWorker();
await worker.loadLanguage("eng");
await worker.initialize("eng");
const convertedImage = await image
.grayscale()
.getBufferAsync(Jimp.MIME_PNG);
await worker.setParameters({ tessedit_char_whitelist: "XY012345678" });
const { data } = await worker.recognize(convertedImage);
But I am not sure if anything in data
allows me to get the desired result. I am not aware of other libraries that might help me
Even with a contrast of 20%, the text was still not getting picked-up. Setting it to 10% worked.
import path from "path";
import Jimp from "jimp";
import { createWorker, PSM } from "tesseract.js";
const __dirname = path.resolve();
const main = async () => {
const imagePath = path.join(__dirname, "image.png");
const bounds = await getBoundingBox(imagePath, "X323Y528", "XY012345689");
console.log("Bounds:", bounds); // { x: 719, y: 173, width: 116, height: 16 }
};
const getBoundingBox = async (imagePath, searchText, allowedCharacters) => {
const worker = await createWorker();
await worker.loadLanguage("eng");
await worker.initialize("eng");
await worker.setParameters({
tessedit_char_whitelist: allowedCharacters,
tessedit_pageseg_mode: PSM.SPARSE_TEXT,
});
const image = await Jimp.read(imagePath);
const imageBuffer = await image
.color([{ apply: "desaturate", params: [90] }])
.contrast(0.1)
.invert()
.write("processed.jpg")
.getBufferAsync(Jimp.MIME_PNG);
const { data } = await worker.recognize(imageBuffer);
const bounds = data.blocks
?.filter(({ text }) => text.trim() === searchText)
.map(({ bbox }) => ({
x: bbox.x0,
y: bbox.y0,
width: bbox.x1 - bbox.x0,
height: bbox.y1 - bbox.y0,
}))
.at(0);
await worker.terminate();
return bounds;
};
(async () => {
await main();
})();
You will need to crop the text out of the image.
The image is too noisy, even if you convert it to greyscale.
Also, you can set tessedit_pageseg_mode
to PSM.SINGLE_LINE
.
import path from "path";
import Jimp from "jimp";
import { createWorker, PSM } from "tesseract.js";
const __dirname = path.resolve();
const main = async () => {
const position = await getPosition(
path.join(__dirname, "image.png"),
700,
160,
150,
40
);
console.log(position); // { x: 323, y: 528 }
};
const getPosition = async (imagePath, xOffset, yOffset, width, height) => {
const worker = await createWorker({
logger: (m) => {
// console.log(m);
},
});
await worker.loadLanguage("eng");
await worker.initialize("eng");
await worker.setParameters({
tessedit_char_whitelist: "XY012345678:",
tessedit_pageseg_mode: PSM.SINGLE_LINE,
});
const image = await Jimp.read(imagePath);
const convertedImage = image
.grayscale()
.contrast(0.3)
.crop(
xOffset ?? 0,
yOffset ?? 0,
width ?? image.bitmap.width,
height ?? image.bitmap.height
)
.write("greyscale.jpg");
const base64 = await convertedImage.getBase64Async(Jimp.AUTO);
const {
data: { text },
} = await worker.recognize(base64);
let [x, y] = text
.match(/X:(\d+)Y:(\d+)/)
?.slice(1)
?.map((v) => parseInt(v, 10)) || [-1, -1];
await worker.terminate();
return { x, y };
};
(async () => {
await main();
})();