I have a job server running Bull and express.
Server requirements
Receive a request containing an object and use that object as an input to a local programme (no other choice) which takes several minutes to a couple hours to run. The jobs MUST be run one after the other in the order they are received (no way round this).
TL;DR Server:
// Set up Bull queue and process
const osmQueue = new Bull("osm", {
redis: {
port: "6969",
},
});
osmQueue.process((job) => {
extractOms(job);
});
// Process function
const extractOms = (job) => {
// I have tried execSync also
spawnSync("programme", [
"this",
"programme",
"takes",
"~30",
"minutes",
"to",
"run",
"and",
"requires",
"full",
"system",
"resources (cannot share system resources with another job)",
]);
return
};
// Express server with single route
const app = express();
app.post("/create-map-job", (req, res) => {
console.log("fired"); // Fires on first request. Blocked on second until job done
osmQueue.add(req.body);
res.send("job successfully queued"); // Returns on first request. Blocked on second until job done
});
app.listen(PORT, () => {
console.log(`Listening on ${PORT}`);
});
The problem:
Things I've tried:
Using spawn() instead of spawnSync(). While this means the requests are no longer blocked, it means that all jobs are executed at the same time. I have looked into Bull's concurrency however when the child_process is asynchronus like spawn() or exec() the job is marked as complete as soon as the programme is successfully begun - it does not wait for the spawn() to complete. This means that the server THINKS that the job is complete and happilly loads in another job and I run out of memory very quickly and the system crashes. I cannot limit or control memory usage at all.. if anything I need more memory for each process, so I have to have only 1 running at a time
Simple calling res.send() BEFORE osmQueue.add(). This has no change on the behaviour.
Using the limiter: {max, duration} option on the queue. This works if I set to limiter duration to say 5 hours, but that massively reduces the amount of work I can do at once to an unacceptably low level.
I have been reading on this and searching for quite some time, but I cannot find a similar question to mine.
Questions:
Let me know if there is anything else I can add to this and I will do so quickly.
TL;DR REQUIREMENT:
Execute a system process as part of a job, in order, one after the other without blocking the server from queuing more jobs or responding to requests while the existing job is running.
Solved; it is amazing how writing out a question in full can inspire the brain and make you look at things again. Leaving this here for future Googlers.
See this from the Bull docs > https://github.com/OptimalBits/bull#separate-processes
I needed to invoke Bull's separate-processes. This allows me to run blocking code in a process separate from the node/express process which means future requests are not blocked, even though synchronous code is running.
// osm.processor.js
module.exports = extractOms (job) {
spawnSync("programme", [
"this",
"programme",
"takes",
"~30",
"minutes",
"to",
"run",
"and",
"requires",
"full",
"system",
"resources (cannot share system resources with another job)",
]);
return;
}
// queue.js
osmQueue.process(
"/path/to/file/above/job-server/processors/osm.processor.js"
);
This spawns the blocking work in a separate process. Thanks Bull!