node.jsnode-cluster

Node.js Clusters - why is the script running multiple times with no obvious reason?


So I've written the below code into a MyScript.js file; (Taken directly from node.js docs at https://nodejs.org/api/cluster.html)

const cluster = require('cluster');
const http = require('http');
const numCPUs = require('os').cpus().length;

console.log("I'm at the top"); //PAY ATTENTION TO THIS LINE

if (cluster.isMaster) {
  console.log(`Master ${process.pid} is running`);

  // Fork workers.
  for (let i = 0; i < numCPUs; i++) {
    cluster.fork();
  }

  cluster.on('exit', (worker, code, signal) => {
    console.log(`worker ${worker.process.pid} died`);
  });
} else {
  // Workers can share any TCP connection
  // In this case it is an HTTP server
  http.createServer((req, res) => {
    res.writeHead(200);
    res.end('hello world\n');
  }).listen(8000);

  console.log(`Worker ${process.pid} started`);
}

When I run it with

>node MyScript.js

, it outputs the below in the command prompt:

I'm at the top...
Master 24632 is running
I'm at the top...
Worker 25536 started
I'm at the top...
Worker 17524 started
I'm at the top...
Worker 19020 started
I'm at the top...
Worker 11352 started

How can it run both the IF part and the ELSE part? where is the magic that causes MyScript.js to run multiple times ?!

Update: tl;dr; The real answer I've since worked out is the "magic" happens because cluster.fork() in effects says: create a new environment (V8) to execute "some javascript module", as defined by process.argv[1]; this so happens to be the full path to MyScript.js;


Solution

  • Think about clusters as forking processes, when you do the for iteration you are creating multiple child processes (the fork part) that are going to run the same script but when they check the if statement they are going to execute the else part because the cluster's master (which is only one) is the one that created the process that is now checking for the if.

    The number of processes that run are the same as the number of CPUs that you have on your computer, so, for example, if your CPU has 8 cores, you would expect to have 8 workers running.

    So basically, you expect to run only once the following

    console.log(`Master ${process.pid} is running`);
    
    // Fork workers.
    for (let i = 0; i < numCPUs; i++) {
      cluster.fork();
    }
    
    cluster.on('exit', (worker, code, signal) => {
      console.log(`worker ${worker.process.pid} died`);
    });
    

    but multiple times the following (as many as the number of CPUs in your computer because you configured it in that way)

    // Workers can share any TCP connection
    // In this case it is an HTTP server
    http.createServer((req, res) => {
      res.writeHead(200);
      res.end('hello world\n');
    }).listen(8000);
    
    console.log(`Worker ${process.pid} started`);
    

    Then, when a child process (not the master one) finishes you will see the following console.log

    console.log(`worker ${worker.process.pid} died`);
    

    EDIT:

    So why does it run the console.log("I'm at the top"); multiple times? What Node does (different from C) is that when it executes a fork it creates a new instance of the V8 engine and that makes the code execute completely each time you create a new process. The function name fork is misleading because it doesn't make a copy of your process and starts running from where you are, but rather it spawns a new process what makes the process to run the whole script again.