socket.iosocket.io-redis

Socket.io - using multiple nodes


So I was looking into running socket.io across multiple processes.

The guide here: https://socket.io/docs/using-multiple-nodes/ left me with some questions.

It mentions using configuring nginx to load balance between socket.io processes, but it also mentions using the built in cluster module in Node.js below.

Am I supposed to be using nginx AND the cluster module in Node.js for this?

Also how do I tell if load balancing is working?

I tested it using the nginx option with two socket.io processes running using the redis adapter and using the cluster module.

This is what I had in my nginx config:

http {

        upstream io_nodes {
        ip_hash;
        server 127.0.0.1:6001;
        server 127.0.0.1:6002;
        }

        server {
        listen 3000;
        server_name example.com;
        location / {
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $host;
        proxy_http_version 1.1;
        proxy_pass http://io_nodes;
        }
        }

This is an example of my socket.io code (most of it taken from here: https://github.com/elad/node-cluster-socket.io):

var express = require('express'),
    cluster = require('cluster'),
    net = require('net'),
    redis = require('redis'),
    sio = require('socket.io'),
    sio_redis = require('socket.io-redis');

var port = 6001,
    num_processes = require('os').cpus().length;

if (cluster.isMaster) {
    console.log('is master 6001');
    // This stores our workers. We need to keep them to be able to reference
    // them based on source IP address. It's also useful for auto-restart,
    // for example.
    var workers = [];

    // Helper function for spawning worker at index 'i'.
    var spawn = function(i) {
        workers[i] = cluster.fork();

        // Optional: Restart worker on exit
        workers[i].on('exit', function(code, signal) {
            console.log('respawning worker', i);
            spawn(i);
        });
    };

    // Spawn workers.
    for (var i = 0; i < num_processes; i++) {
        spawn(i);
    }

    // Helper function for getting a worker index based on IP address.
    // This is a hot path so it should be really fast. The way it works
    // is by converting the IP address to a number by removing non numeric
  // characters, then compressing it to the number of slots we have.
    //
    // Compared against "real" hashing (from the sticky-session code) and
    // "real" IP number conversion, this function is on par in terms of
    // worker index distribution only much faster.
    var worker_index = function(ip, len) {
        var s = '';
        for (var i = 0, _len = ip.length; i < _len; i++) {
            if (!isNaN(ip[i])) {
                s += ip[i];
            }
        }

        return Number(s) % len;
    };

    // Create the outside facing server listening on our port.
    var server = net.createServer({ pauseOnConnect: true }, function(connection) {
        // We received a connection and need to pass it to the appropriate
        // worker. Get the worker for this connection's source IP and pass
        // it the connection.
        var worker = workers[worker_index(connection.remoteAddress, num_processes)];
        worker.send('sticky-session:connection', connection);
    }).listen(port);
} else {
    // Note we don't use a port here because the master listens on it for us.
    var app = new express();

    // Here you might use middleware, attach routes, etc.

    // Don't expose our internal server to the outside.
    var server = app.listen(0, 'localhost'),
        io = sio(server);

    // Tell Socket.IO to use the redis adapter. By default, the redis
    // server is assumed to be on localhost:6379. You don't have to
    // specify them explicitly unless you want to change them.
    io.adapter(sio_redis({ host: 'localhost', port: 6379 }));

    // Here you might use Socket.IO middleware for authorization etc.
        io.on('connection', function(socket) {
        console.log('port 6001');
        console.log(socket.id);
    });
    // Listen to messages sent from the master. Ignore everything else.
    process.on('message', function(message, connection) {
        if (message !== 'sticky-session:connection') {
            return;
        }
        // Emulate a connection event on the server by emitting the
        // event with the connection the master sent us.
        server.emit('connection', connection);

        connection.resume();
    });
}

Connections worked just fine with this, although I'm testing it all locally..

How do I know if it's working properly? Every time the client connects, it seems to connect to the socket.io process on port 6001.

The client connect code connects to port 3000.


Solution

  • Am I supposed to be using nginx AND the cluster module in Node.js for this?

    If all your server processes are on one computer, you can use the cluster module without NGINX.

    If you're using multiple server computers, then you need a piece of network infrastructure like NGINX to load balance among the different servers since node.js clustering cannot do that for you.

    And, you can use both together (multiple servers load balanced by something like NGINX and each server running clustering on each server). The key here is that node.js clustering only spreads the load among different processes on the same host.

    Also how do I tell if load balancing is working?

    You can have each process log the activity it is processing and add the process ID as part of the logging and if you are loading your server with multiple requests at the same time, you should see some load being handled by each process. If you do actual load testing, you should get significantly more throughput when clustering is on and working vs. not using clustering. Keep in mind that total throughput depends upon where your bottlenecks are so if your server is actually database bound and all clustered processes are using the same database, you may not benefit much from clustering the node.js process. If, on the other hand, your node.js process is compute intensive and you have multiple cores in your server, you may get a significant benefit from clustering.