javascriptnode.jsdockernode-cluster

Node.js Cluster architecture: how to scale master worker


I have build a Node.js built-in cluster architecture with master/worker configuration. The application is using express to serve api and static files and it is deployed with Docker:

[D O C K E R: 8080] --- N ---> [W O R K E R: 3001 ]  --- 1 ---> [M A S T E R: 3000]

I have N worker in Worker.js and 1 master in master.js. Master and worker share common modules, while the master has a core module that loads core services and exposes an api on PORT=3001, a worker loads the other apis on PORT=3000, where the Docker container has been bind. While a routing proxy on a Worker will forward requests to the Master in order to serve requests to core modules, the other requests are being server on 3000 directly.

The start script looks like

'use strict';
(function() {

/// node clustering
const cluster = require('cluster');
const numCPUs = require('os').cpus().length;

if (cluster.isMaster) { // master node
    var masterConfig=require('./config/masterconfig.json');

    // Fork workers.
    var maxCPUs = process.env.WORKER_NUM || masterConfig.cluster.worker.num;
    maxCPUs=(maxCPUs>numCPUs)?numCPUs:maxCPUs;

    for (let i = 0; i < maxCPUs; i++) {
        const worker=cluster.fork();
    }

    var MasterNode=require('./lib/master');
    var master= new MasterNode(masterConfig);
    master.start()
    .then(done=> {
        console.log(`Master ${process.pid} running on ${masterConfig.pubsub.node}`);
    })
    .catch(error=> { // cannot recover from master error
        console.error(`Master ${process.pid} error`,error.stack);
        process.exit(1);
    });
}
else if (cluster.isWorker) { // worker node
    var workerConfig=require('./config/workerconfig.json');
    var WorkerNode=require('./lib/worker');
    var worker= new WorkerNode(workerConfig);
    worker.start()
    .then(done=> {
        console.log(`Worker ${process.pid} running on ${workerConfig.pubsub.node}`);
    })
    .catch(error=> { // worker error is recoverable
        console.error(`Worker ${process.pid} error`,error.stack);
    });
}

}).call(this);

I have the following question.

1) By defaults the cluster module share the underlining HTTP connection uses a round-robin approach to serve requests - see here, where worker processes are spawned using the child_process.fork(). I do not know if I can customize this method to distributing incoming connections.

2) So far, I serve static files, templates (like pig/swig) in a express web application on each Worker on PORT=3000, thus meaning that I run static routes for the web app on each worker instance spawned. I'm not sure if this, in terms of memory occupation is the best approach.

3) Other clustering approach. I have asked about migrating this architecture to PM2, despite it seems to promising, I'm not sure it's the best option - see here for more details.


Solution

  • The master should only care about starting the workers and shutting them down properly/watching out for signals from the host and responding accordingly. In my experience, I've had tricky bugs because I exposed an API on the master that should have been on a worker.

    If you are planning to switch to PM2, PM2 will handle your master and you will need to move that code to the worker anyway (or at least that used to be the case)

    Regarding your questions;

    1. If you have the need to override the round-robin or customize it, I think you have the goal to route the same client-traffic to the same worker, aka Sticky Sessions. There are ways to do so but there are limitations; if you are using a reverse proxy like nginx or haproxy in front of node (which you should) and also want sockets to work as expected (and have Docker in the game), you cant really fan out on the workers because the IP you see (on which you will calc the sticky session id) will always be the one of your proxy or of your docker host (even with x-forwarded-for header), which defeats the purpose of clustering in the first place. -> My Solution was to start each worker on a new port (e.g 3001, 3002 ... 300N) and let nginx handle the sticky session handling
    2. This is not a problem but isn't ideal - and yes, memory will slightly go up because each worker loads the routes and modules. But nginx is much faster in handling static files (and handling the cache for it with the many http-headers) than node is. So you should rely on nginx serving statics and keep node for dynamic requests (like /api /login etc.)
    3. PM2 is a good solution that has many advanced features such as reporting statistics and handle zero-downtime deployments but also costs money depending on which features you want to use