high-availabilityfailoveractivemq-artemisjgroups

Achieve high availability and failover in Artemis Cluster with shared-store HA policy through JGroups protocol


In the documentation of Artemis ActiveMQ it is stated that if high availability is configured for the replication HA policy then you can specify a group of live servers that a backup server can connect to. This is done by configuring group-name in the master and the slave element of the broker.xml. A backup server will only connect to a live server that shares the same node group name.

But in shared-store there is no such concept of group-name. I am confused. If I have to achieve high availability through shared-store in JGroups then how it can be done.

Again when I tried doing it through replication HA policy providing group-name the cluster was formed and failover was working, but I got the warning saying:

2020-10-02 16:35:21,517 WARN  [org.apache.activemq.artemis.core.client] AMQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=220da24b-049c-11eb-8da6-0050569b585d
2020-10-02 16:35:21,517 WARN  [org.apache.activemq.artemis.core.client] AMQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=220da24b-049c-11eb-8da6-0050569b585d
2020-10-02 16:35:25,350 WARN  [org.apache.activemq.artemis.core.server] AMQ224078: The size of duplicate cache detection (<id_cache-size/>) appears to be too large 20,000. It should be no greater than the number of messages that can be squeezed into confirmation window buffer (<confirmation-window-size/>) 32,000.

Solution

  • As the name "shared-store" indicates, the live and the backup broker become a logical pair which can support high availability and fail-over because they share the same data store. Because they share the same data store there is no need for any kind of group-name configuration. Such an option would be confusing, redundant, and ultimately useless.

    The JGroups configuration (and the cluster-connection more generally) exists because the two brokers need to exchange information with each other about their respective network locations so that the live broker can inform clients how to connect to the backup in case of a failure.

    Regarding the WARN message about duplicate node ids on the network...You might get that warn message once, possibly twice, during failover or fail-back, but if you see it more than that then there's something wrong. If you're using shared-store it indicates a problem with the locks on the shared file system. If you're using replication then that indicates a potential misconfiguration or possibly a split-brain.