google-chromegoogle-chrome-extensionchrome-extension-manifest-v3tesseract.js

Using tesseract.js with manifestv3


I am developing a chrome extension to read and run ocr on some pdf files.

I've noticed the recent manifest v3 changes, and i'm stuck on the import part to use tesseract

Here is the tree of my files:

── extension
│   ├── background.js
│   ├── content.js
│   ├── index.css
│   ├── index.html
│   ├── manifest.json
│   └── scripts
│       ├── tesseract.min.js
│       └── worker.min.js
├── extension.zip
├── ext.sh
└── public
    ├── background.js
    ├── content.js
    ├── index.css
    ├── index.html
    ├── manifest.json
    └── scripts

Here is the content of my background and content js files:

//background.js    
chrome.action.onClicked.addListener((tab) => {
        if (tab.url.includes('mail.google.com') || tab.url.includes('outlook.live.com')) {
            try {
                importScripts('scripts/tesseract.min.js');
              } catch (e) {
                console.error(e);
            }
            chrome.scripting.executeScript({
                target: { tabId: tab.id },
                files: ['content.js'],
                world: 'MAIN',
                allFrames: true
            });
    
        } else {
            console.log('Unsupported domain.');
        }
    });

And the other file:

//content.js
//function calling tesseract to try to perform ocr
console.log(window.Tesseract);
        const worker = window.Tesseract.createWorker('eng');
        const data = await worker.recognize(blob);
        await worker.terminate();
        return data.text;

When i run this code, i get the following error in the browser console:

Refused to load the script 'https://cdn.jsdelivr.net/npm/tesseract.js@v5.0.1/dist/worker.min.js' because it violates the following Content
Uncaught DOMException: Failed to execute 'importScripts' on 'WorkerGlobalScope': The script at 'https://cdn.jsdelivr.net/npm/tesseract.js@v5.0.1/dist/worker.min.js' failed to load.

I can however, call window.Tesseract that returns a tesseract object to me.

What did i do wrong?

I don't use Webpack or anything, just raw js.

Thanks


Solution

  • I've been using the offscreen document approach for Tesseract.js in a Chrome extension and encountered some issues. Here is my code:

    async function createTWorker(lang: string): Promise<Tesseract.Worker> {
    const worker = await Tesseract.createWorker({
        workerPath: chrome.runtime.getURL("scripts/worker.min.js"),
        langPath: chrome.runtime.getURL("scripts/languages/"),
        corePath: chrome.runtime.getURL("scripts/"),
        workerBlobURL: false,
        logger: (m: any) => console.log(m),
    });
    return worker;
    }
    
    const script = document.createElement('script');
    script.src = chrome.runtime.getURL('scripts/tesseract.min.js');
    document.head.appendChild(script);
    

    This code returns the following error:

    Uncaught Error: TypeError: x.map is not a function
    at createWorker.js:247:15
    at t.onmessage (onMessage.js:3:5)
    

    When I remove workerBlobURL: false, I get a different error related to the Content Security Policy (CSP):

    Refused to load the script 'chrome- 
    extension://lloecilpelefammfhnafkjeijmokndcc/scripts/worker.min.js' 
    because it violates the following Content Security Policy directive: 
    "script-src 'self' 'wasm-unsafe-eval' 'inline-speculation-rules'"
    

    Thanks