javascriptnode.jsocrroitext-recognition

How to get the coordinates of text on an image with Node.js?


I am trying to get the x and y coordinates of specific text on an image like this. On this image I am trying to detect where X:input Y:input is located which could be anywhere on future images. In this case I would expect it to be around 714, 164, 125, 32 (x, y, width height).

I tried to use Tesseract and Jimp

const worker = await Tesseract.createWorker();

await worker.loadLanguage("eng");
await worker.initialize("eng");

const convertedImage = await image
  .grayscale()
  .getBufferAsync(Jimp.MIME_PNG);

await worker.setParameters({ tessedit_char_whitelist: "XY012345678" });

const { data } = await worker.recognize(convertedImage);

But I am not sure if anything in data allows me to get the desired result. I am not aware of other libraries that might help me


Solution

  • Updated response

    Even with a contrast of 20%, the text was still not getting picked-up. Setting it to 10% worked.

    import path from "path";
    import Jimp from "jimp";
    import { createWorker, PSM } from "tesseract.js";
    
    const __dirname = path.resolve();
    
    const main = async () => {
      const imagePath = path.join(__dirname, "image.png");
      const bounds = await getBoundingBox(imagePath, "X323Y528", "XY012345689");
    
      console.log("Bounds:", bounds); // { x: 719, y: 173, width: 116, height: 16 }
    };
    
    const getBoundingBox = async (imagePath, searchText, allowedCharacters) => {
      const worker = await createWorker();
    
      await worker.loadLanguage("eng");
      await worker.initialize("eng");
    
      await worker.setParameters({
        tessedit_char_whitelist: allowedCharacters,
        tessedit_pageseg_mode: PSM.SPARSE_TEXT,
      });
    
      const image = await Jimp.read(imagePath);
      const imageBuffer = await image
        .color([{ apply: "desaturate", params: [90] }])
        .contrast(0.1)
        .invert()
        .write("processed.jpg")
        .getBufferAsync(Jimp.MIME_PNG);
    
      const { data } = await worker.recognize(imageBuffer);
    
      const bounds = data.blocks
        ?.filter(({ text }) => text.trim() === searchText)
        .map(({ bbox }) => ({
          x: bbox.x0,
          y: bbox.y0,
          width: bbox.x1 - bbox.x0,
          height: bbox.y1 - bbox.y0,
        }))
        .at(0);
    
      await worker.terminate();
    
      return bounds;
    };
    
    (async () => {
      await main();
    })();
    

    Original response

    You will need to crop the text out of the image.

    The image is too noisy, even if you convert it to greyscale.

    Also, you can set tessedit_pageseg_mode to PSM.SINGLE_LINE.

    import path from "path";
    import Jimp from "jimp";
    import { createWorker, PSM } from "tesseract.js";
    
    const __dirname = path.resolve();
    
    const main = async () => {
      const position = await getPosition(
        path.join(__dirname, "image.png"),
        700,
        160,
        150,
        40
      );
    
      console.log(position); // { x: 323, y: 528 }
    };
    
    const getPosition = async (imagePath, xOffset, yOffset, width, height) => {
      const worker = await createWorker({
        logger: (m) => {
          // console.log(m);
        },
      });
    
      await worker.loadLanguage("eng");
      await worker.initialize("eng");
      await worker.setParameters({
        tessedit_char_whitelist: "XY012345678:",
        tessedit_pageseg_mode: PSM.SINGLE_LINE,
      });
    
      const image = await Jimp.read(imagePath);
      const convertedImage = image
        .grayscale()
        .contrast(0.3)
        .crop(
          xOffset ?? 0,
          yOffset ?? 0,
          width ?? image.bitmap.width,
          height ?? image.bitmap.height
        )
        .write("greyscale.jpg");
      const base64 = await convertedImage.getBase64Async(Jimp.AUTO);
    
      const {
        data: { text },
      } = await worker.recognize(base64);
    
      let [x, y] = text
        .match(/X:(\d+)Y:(\d+)/)
        ?.slice(1)
        ?.map((v) => parseInt(v, 10)) || [-1, -1];
    
      await worker.terminate();
    
      return { x, y };
    };
    
    (async () => {
      await main();
    })();