tcpjgroupsgossip

Best JGroups stack configuration for leader/follower relationship


I built a distributed system where different nodes can either be leader or follower. In most of use cases, I'll have only one leader and several followers. The leader is generally single on its server while the followers are running on other servers (some of them in the same JVM).

Follower nodes will never have to send messages to each other, they will only communicate with leader nodes. Currently, I use the tcpgossip protocol to discover the members of a cluster. My GossipRouter runs in the same JVM than the leader node. It works pretty well actually, my cluster seems sufficiently stable.

As far as I understand the tcpgossip protocol, each node reaches out the GossipRouter and fetches information from it. So in my case, all follower nodes contact the server on which the leader node is running. However when I turn off one of the follower node, I can see warning messages from other follower nodes that state:

WARNING: thread=TransferQueueBundler,myCluster,ROCKET-21632 Fri Nov 18 10:22:11 CET 2016 org.jgroups.protocols.BaseBundler sendSingleMessage JGRP000029: ROCKET-21632: failed sending message to zeus-10187 (102 bytes): java.net.SocketTimeoutException: connect timed out, headers: VERIFY_SUSPECT: [VERIFY_SUSPECT: ARE_YOU_DEAD], TP: [cluster_name=myCluster]

WARNING: thread=TransferQueueBundler,myCluster,ROCKET-21632 Fri Nov 18 10:21:19 CET 2016 org.jgroups.protocols.TP sendToMembers JGRP000034: ROCKET-21632: failure sending message to zeus-10187: java.net.SocketTimeoutException: connect timed out

where ROCKET-21632 and zeus-10187 are two followers. I was expected that the followers would not talk to each other because a GossipRouter is used but it seems not to be the case.

Is there a way to build a cluster where some nodes will never talk to each other?


Solution

  • You use TCPGOSSIP only as discovery mechanism. So if you ran GossipRouter on a separate node, started all the nodes and then killed the GossipRouter, things would still work (except for merging though). Only discovery (and hence joining of new members) would not work.

    Even if you use TCPGOSSIP, members do talk to each other directly. If you don't want that, replace TCP as transport with TUNNEL. All members will direct all of their messages to a GossipRouter, which forwards them to the other members. The downside though is that the leader node gets a lot of traffic; in contrast, members talking to each other directly spreads the traffic mode evenly across a cluster.

    If you want to use TUNNEL:TCPGOSSIP, I recommend use multiple GossipRouters for fault tolerance.