I am currently trying to manage an openmq cluster (with glassfish 3.0.1) and I encounter some strange behavior.
The cluster work for seven months now without any problems with 2 broker registered.
I now need to add two other broker in the cluster temporarily. Here is what I have before:
------------------------- Host Primary Port ------------------------- localhost 7676 Cluster ID MyCluster Cluster is Highly Available true ------------------------------------------------------------------------------------------------------------- ID of broker Time since last Broker ID Address State Msgs in store performing takeover status timestamp ------------------------------------------------------------------------------------------------------------- Broker1 192.168.0.1:7676 OPERATING 5 6 seconds Broker2 192.168.0.2:7676 OPERATING 8 6 seconds
Then, I successfully start two other broker on two other server and I got:
------------------------- Host Primary Port ------------------------- localhost 7676 Cluster ID MyCluster Cluster is Highly Available true ------------------------------------------------------------------------------------------------------------- ID of broker Time since last Broker ID Address State Msgs in store performing takeover status timestamp ------------------------------------------------------------------------------------------------------------- Broker1 192.168.0.1:7676 OPERATING 5 6 seconds Broker2 192.168.0.2:7676 OPERATING 8 6 seconds Broker3 192.168.0.3:7676 OPERATING 5 6 seconds Broker4 192.168.0.4:7676 OPERATING 8 6 seconds
The application run well with the configuration, automatically using the two new brokers. The problem occurs when I stop a broker from the cluster using the following command:
./imqcmd shutdown bkr
on one of the server. The result of ./imqcmd list bkr
is the following:
------------------------- Host Primary Port ------------------------- localhost 7676 Cluster ID MyCluster Cluster is Highly Available true ------------------------------------------------------------------------------------------------------------- ID of broker Time since last Broker ID Address State Msgs in store performing takeover status timestamp ------------------------------------------------------------------------------------------------------------- Broker1 192.168.0.1:7676 OPERATING 5 6 seconds Broker2 192.168.0.2:7676 OPERATING 8 6 seconds Broker3 192.168.0.3:7676 TAKEOVER_COMPLETE 0 Broker1 6 seconds Broker4 192.168.0.4:7676 OPERATING 8 6 seconds
Everything seems to be ok, takeover is performed by Broker1, but when I look at the server.log of the glassfishs, I found the following line :
[C4003]: Error occurred on connection creation [192.168.0.3:7676]. - cause: java.net.ConnectException: Connection refused|#]
Like if the glassfish try to connect to the shuted down broker.
Is there something I missed ?
Thanks for your help.
The missing command line was:
imqdbmgr remove bkr -n Broker3
then a list will ouput: ------------------------------------------------------------------------------------------------------------- ID of broker Time since last Broker ID Address State Msgs in store performing takeover status timestamp ------------------------------------------------------------------------------------------------------------- Broker1 192.168.0.1:7676 OPERATING 5 6 seconds Broker2 192.168.0.2:7676 OPERATING 8 6 seconds Broker4 192.168.0.4:7676 OPERATING 8 6 seconds
the Broker3 is no more registered in the broker HA cluster.