jmsactivemq-artemis

ActiveMQ Artemis Consumer Connection Distribution


This question is a bit of a follow on to something I previously asked.

The setup is the same as outlined in that question, with a symmetric cluster of 4 stand-alone ActiveMQ Artemis (v2.20) nodes, each with the same configuration - same settings, queues etc. Multiple client apps connecting into that cluster as either message consumers or message, and all the clients connect with a connect string like this:

(tcp://artemis1:61616,tcp://artemis2:61616,tcp://artemis3:61616,tcp://artemis4:61616)?type=XA_CF&ha=true&retryInterval=1000&retryIntervalMultiplier=2&maxRetryInterval=32000&reconnectAttempts=-1

Typically, consumer connections are persistent and can remain for several hours.

The particular issue we have is that sometimes, there will be a situation where a particular queue will have no consumers on any of the 4 nodes. The consuming app might be shut down for an hours maintenance, for example. During that time a producer app might send messages to the queue, and in the absence of any consumers these are distributed fairly randomly. E.g. lets say artemis1 and artemis3 get 1 message each, and the other 2 servers remain with zero messages. Then, a little later consumers connect back to the queue - but these appear to connect without regard to the state of messages on the queues. If the client application spawns 2 consumer threads, these might connect to say artemis1 and artemis4, which leaves the message on artemis3 "stuck" indefinitely. Per the other question linked above, it appears redistribution does not kick in in this scenario either. I've noted that even in situations where the client spawns more consumer threads than the number of servers, we may still end up with a situation where one server does not get any consumers and therefore can be left with "stranded" messages.

Hope that explanation of the issue makes sense! Should we expect this situation to occur, or should the consumer connections be more aware of the state of messages across the cluster, on the target queue? Appreciate any suggestions of what we might be able to do to avoid this problem, or any other comments.

Note the reverse does not apply - message producer clients do show awareness of consumer distrubution: messages always go to servers where a consumer is connected (if any) in preference to one without.


Solution

  • for info, we started some testing and were able to readily replicate the (lack of) message redistribution issue on v2.20. We then moved up to v2.31 and could no longer replicate with same test cases. We've since release v2.31 to Production and we're no longer seeing these types of issues there either. So looks to have been addressed in between those two versions.