java sockets network-programming zeromq jeromq

Router/Dealer Pattern facing unnecessary delay during multiple connections

I am using the Jer0mq server socket model and specifically the router dealer pattern because I want to validate the identity of the clients. My problem is that I notice random upsurge spikes of 500 ms when I use a loop case where the server binds the socket and the client tries to connect. From negligible delays up to 500 ms delay, why is this happening? how can I avoid such latency? is this possible? What am I doing wrong? Here is my simple code to test it.

package sockets;

import org.zeromq.SocketType;
import org.zeromq.ZContext;
import org.zeromq.ZMQ;

import java.nio.charset.StandardCharsets;

import static sockets.rtdealer.NOFLAGS;

public class ZmqStack {
    public static void main(String[] args) throws InterruptedException {
        Thread brokerThread = new Thread(() -> {
            while (true) {
                try (ZContext context = new ZContext()) {
                    ZMQ.Socket broker = context.createSocket(SocketType.ROUTER);
                    broker.bind("tcp://*:5555");
                    String identity = new String(broker.recv());
                    String data1 = new String(broker.recv());
                    String identity2 = new String(broker.recv());
                    String data2 = new String(broker.recv());
                    System.out.println("Identity: " + identity + " Data: " + data1);
                    System.out.println("Identity: " + identity2 + " Data: " + data2);

                    broker.sendMore(identity.getBytes(ZMQ.CHARSET));
                    broker.send("xxx1".getBytes(StandardCharsets.UTF_8));

                    broker.sendMore(identity2.getBytes(ZMQ.CHARSET));
                    broker.send("xxx12");

                    broker.close();
                    context.destroy();
                }
            }
        });
        brokerThread.setName("broker");

        Thread workerThread = new Thread(() -> {
            while (true) {
                try (ZContext context = new ZContext()) {
                    ZMQ.Socket worker = context.createSocket(SocketType.DEALER);
                    String identity = "identity1";
                    worker.setIdentity(identity.getBytes(ZMQ.CHARSET));
                    worker.connect("tcp://localhost:5555");


                    worker.send("Hello1".getBytes(StandardCharsets.UTF_8));

                    String workload = new String(worker.recv(NOFLAGS));
                    System.out.println(Thread.currentThread().getName() + " - Received " + workload);
                }
            }
        });
        workerThread.setName("worker");

        Thread workerThread1 = new Thread(() -> {
            while (true) {
                try (ZContext context = new ZContext()) {
                    ZMQ.Socket worker = context.createSocket(SocketType.DEALER);
                    worker.setIdentity("Identity2".getBytes(ZMQ.CHARSET));

                    worker.connect("tcp://localhost:5555");

                    long start = System.currentTimeMillis();
                    worker.send("Hello2 " + Thread.currentThread().getName());
                    String workload = new String(worker.recv(NOFLAGS));
                    long finish = System.currentTimeMillis();
                    long timeElapsed = finish - start;
                    System.out.println(Thread.currentThread().getName() + " - Received " + workload);
                    System.out.println("Elapsed Time: " + timeElapsed);
                }
            }
        });
        workerThread1.setName("worker1");

        workerThread1.start();
        workerThread.start();
        brokerThread.start();
    }
}

Solution

(...) why is this happening

This mainly happens due to repeatedly paying costs of assembling and immediately disposing of ( repetitively induced, expensive and slow garbage-collections not mentioning at all ) the whole Context-based thread-local engines. More details, on smaler scale latency effects, below.

Each and every instantiation is expensive ( both in TimeDOMAIN ~ CPU-wise and SpaceDOMAIN ~ MEM-I/O-wise and by lost ability of cache latency-masking-wise ), so where not forbidden by some not mentioned reasons, all the ZeroMQ infrastructure instantiations are best placed to happen only once, at the initiation, and remain allocated, configured, used, reused and operated throughout the whole life-span of the processing, be it one, ten or a hundred Context()-engines and their sub-instantiated Socket()-archetypes and their respective, TransportClass-specific AccessPoint-engines.

So the biggest and cheapest improvement will come from removing these repetitive ZeroMQ-infrastructure re-instantiations outside from the infinite loops. One-shot ASAP creation and performance-boosted configuration with ALAP operations avoided as much as possible - like redefining the ZeroMQ-frame payloads into a single-frame ( all delays from blocking-blind multi-frame dances rather get avoided too, your system will not be blocked into a self-inflicted deadlock by any first ill-composed message - who would ever want this, would we? ) byte-mapped field.

This is where low-latency, reasonably robust distributed designs start.

Q2 : "(...) how can I avoid such latency?"

My shopping list will continue ( after above hints were already refactored ) :

-- use less expensive TransportClasses - avoiding meaning-less costs of TCP/IP assy/packeting/buffering/decoding/disass ( all done by O/S stacks, so completely outside the native powers of the ZeroMQ zero-copy design Zen-of-Zero philosophy ) makes sense for all localhost inter-thread transports ( best use inproc: or ipc: or tipc: TransportClasses that spend less to both send and deliver )

-- use any performance related Context()-instance parameters to configure, boost and rebalance your application needs for high-performance low-latency Signalling & Messaging. We may increase number of internal Context-I/O-threads, we can even make them hardware-bound and reserve thus some CPU-cores for exclusive serving (only) for our Signalling & Messaging meta-plane, to indeed shave off even the localhost operating system "cooperative scheduling" noise from our intended high-performance low-latency productivity envelope.

-- use similarly any of .setsockopt() configurations ( where available on all distributed-system nodes, be warned that some wrappers did cut corners of the original native API so test both version-differences, classically for the costs of topic-filtering, degenerating topic-filtering to work only under a private assumption of using multi-frame payloads instead, which other language wrapper in localhost or remote node need not know beforehand or ever ... and these possible sender-side and receiver-side ZeroMQ wrappers' side-effects ( what works among jeromq wrapper equipped code need not work the same way with other wrappers - many examples what failed this way ).

These rules of thumb are worth starting with.