node.jsstreamunix-socket

How to properly implement Node.js communication over Unix Domain sockets?


I'm debugging my implementation of an IPC between multithreaded node.js instances. As datagram sockets are not supported natively, I use the default stream protocol, with simple application-level packaging. When two threads communicate, the server side is always receiving, the client side is always sending.

// writing to the client trasmitter
// const transmitter = net.createConnection(SOCKETFILE);

    const outgoing_buffer = [];
    let writeable = true;
    
    const write = (transfer) => {
        if (transfer) outgoing_buffer.push(transfer);
        if (outgoing_buffer.length === 0) return;
        if (!writeable) return;
        const current = outgoing_buffer.shift();
        writeable = false;
        transmitter.write(current, "utf8", () => {
            writeable = true;
            write();
        });
    };
// const server = net.createServer();
// server.listen(SOCKETFILE);
// server.on("connection", (reciever) => { ...
// reciever.on("data", (data) => { ...
// ... the read function is called with the data

 let incoming_buffer = "";
    const read = (data) => {
        incoming_buffer += data.toString();
        while (true) {
            const decoded = decode(incoming_buffer);
            if (!decoded) return;
            incoming_buffer = incoming_buffer.substring(decoded.length);
            // ... digest decoded string
        }
    };

My stream is encoded in transfer packages, and decoded back, with the data JSON stringified back and forth.

Now what happens is, that from time to time, as it seems more frequently at higher CPU loads, the incoming_buffer gets some random characters, displayed as ��� when logged.

Even if this is happening only once in 10000 transfers, it is a problem. I would need a reliable way, even if the CPU load is at max, the stream should have no unexpected characters, and should not get corrupted.

What could potentially cause this? What would be the proper way to implement this?


Solution

  • Okay, I found it. The Node documentation gives a hint.

    readable.setEncoding(encoding)
    

    Must be used instead of incoming_buffer += data.toString();

    The readable.setEncoding() method sets the character encoding for data read from the Readable stream.

    By default, no encoding is assigned and stream data will be returned as Buffer objects. Setting an encoding causes the stream data to be returned as strings of the specified encoding rather than as Buffer objects. For instance, calling readable.setEncoding('utf8') will cause the output data to be interpreted as UTF-8 data, and passed as strings. Calling readable.setEncoding('hex') will cause the data to be encoded in hexadecimal string format.

    The Readable stream will properly handle multi-byte characters delivered through the stream that would otherwise become improperly decoded if simply pulled from the stream as Buffer objects.

    So it was rather dependent on the number of multibyte characters in the stress test, than on the CPU load.