I recently read about Node's "worker_threads" module that allows parallel execution of Javascript code in multiple threads which is useful for CPU-intensive operations. (NOTE: these are not web workers made by Chrome in the browser)
I'm building a feature where I need to do a massive amount of Postgres INSERTs without blocking the browser.
The problem is: in my Javascript files where I instantiate the worker, I'm not allowed to import anything, including native Node modules or NPM libraries like Knex.js which is necessary to do database queries. I get an error that says: Cannot use import statement outside a module as soon as the file is executed.
I've tried putting the worker code in another file with an import statement at the top (same error). I've tried giving the Knex object to workerData but it cannot clone a non-native JS object.
I'm out of ideas- does anyone know how to interact with a database in a worker thread if we can't import any NPM libraries?!?!
// mainThread.js
const { Worker, isMainThread, parentPort, workerData } = require('worker_threads');
import knex from 'knex'; // --> *** UNCAUGHT EXCEPTION: Cannot use import statement outside a module ***
if (isMainThread) {
module.exports = async function runWorker (rowsToInsert = []) {
return new Promise((resolve, reject) => {
const worker = new Worker(__filename, { workerData: { rowsToInsert } });
worker.on('message', (returningRows) => resolve(returningRows));
worker.on('error', reject);
worker.on('exit', (code) => {
if (code !== 0) reject(new Error(`Worker stopped with exit code ${code}`));
});
});
};
} else {
const { rowsToInsert } = workerData;
return knex('table').insert(rowsToInsert)
.then((returningRows) => {
parentPort.postMessage({ data: returningRows });
});
}
I am following a tutorial from this webpage: https://blog.logrocket.com/use-cases-for-node-workers/
It is of course possible, but it's a very bad idea.
Database drivers are already asynchronous and non-blocking of the JavaScript thread. Moving your insert calls to a separate thread as you propose will not only get you no performance gains, it will actually decrease overall performance because of the overhead involved with interthread communication:
rowsToInsert
must be copied, which is (relatively) expensive.Generally, the only time it's really appropriate to use JS threads is when your JavaScript code is performing CPU-intensive work. The node docs say as much right at the top:
Workers (threads) are useful for performing CPU-intensive JavaScript operations. They will not help much with I/O-intensive work. Node.js’s built-in asynchronous I/O operations are more efficient than Workers can be.
This means if you're doing a lot of parsing, math, or similar, it may be appropriate to do the work in a thread. However, simply shoveling data from one place to another (ie, I/O) is not a good candidate for a thread — after all, node's design is tuned to be efficient at this kind of work.
You don't say where your rowsToInsert
come from, but if it's coming in from HTTP request(s), a thread is the wrong thing to use. However, if you're parsing eg a CSV or JSON file on the server, it may be worthwhile to do that in a thread, but it's important that the thread does all the work (so memory need not be moved between threads). The message you post to the worker should just be "process the file located at /foo/bar.csv", and then the worker thread does the rest.
The error you're getting is the same that you'd get without worker threads: you're trying to use import
in a regular, non-module JS file. Either rename the worker file to *.mjs or use require('knex')
instead.
node's ES module documentation goes into detail about how import
differs from require
.