I am running RabbitMQ v3.3.5 with Erlang OTP 17.1 on Windows 2008 R2. My Dev and QA environments are stand-alone. My staging and production environments are clustered.
I am finding this one problem happening often where the RabbitMQ service is running, the RabbitMQ management console is seeing everything, but when I try running rabbitmqctl from the command line it fails with an error saying that the node is down (tried locally and on a remote server).
This problem is resolved if I restart the Windows service.
I see no error message in the RabbitMQ error log. The last message indicated that the node was up.
Below is an example output of the issue that I recently experienced on node 2 of our staging windows cluster:
PS C:\Program Files (x86)\RabbitMQ Server\rabbitmq_server-3.3.5\sbin> .\rabbitmqctl.bat status
Status of node rabbit@MYSERVER2 ...
Error: unable to connect to node rabbit@MYSERVER2: nodedown
DIAGNOSTICS
===========
attempted to contact: [rabbit@MYSERVER2]
rabbit@MYSERVER2:
* connected to epmd (port 4369) on MYSERVER2
* epmd reports: node 'rabbit' not running at all
no other nodes on MYSERVER2
* suggestion: start the node
current node details:
- node name: rabbitmqctl2199771@MYSERVER2
- home dir: C:\Users\RabbitMQ
- cookie hash: mn6OaTX9mS4DnZaiOzg8pA==
at this point I restart the RabbitMQ service and then try again
PS C:\Program Files (x86)\RabbitMQ Server\rabbitmq_server-3.3.5\sbin> .\rabbitmqctl.bat status
Status of node rabbit@MYSERVER2...
[{pid,3784},
{running_applications,
[{rabbitmq_management_agent,"RabbitMQ Management Agent","3.3.5"},
{rabbit,"RabbitMQ","3.3.5"},
{os_mon,"CPO CXC 138 46","2.2.15"},
{mnesia,"MNESIA CXC 138 12","4.12.1"},
{xmerl,"XML parser","1.3.7"},
{sasl,"SASL CXC 138 11","2.4"},
{stdlib,"ERTS CXC 138 10","2.1"},
{kernel,"ERTS CXC 138 10","3.0.1"}]},
{os,{win32,nt}},
{erlang_version,
"Erlang/OTP 17 [erts-6.1] [64-bit] [smp:4:4] [async-threads:30]\n"},
{memory,
[{total,35960208},
{connection_procs,2704},
{queue_procs,5408},
{plugins,111936},
{other_proc,13695792},
{mnesia,102296},
{mgmt_db,0},
{msg_index,21816},
{other_ets,884704},
{binary,25776},
{code,16672826},
{atom,602729},
{other_system,3834221}]},
{alarms,[]},
{listeners,[{clustering,25672,"::"},{amqp,5672,"::"},{amqp,5672,"0.0.0.0"}]},
{vm_memory_high_watermark,0.4},
{vm_memory_limit,3435787059},
{disk_free_limit,50000000},
{disk_free,74911649792},
{file_descriptors,
[{total_limit,8092},
{total_used,4},
{sockets_limit,7280},
{sockets_used,2}]},
{processes,[{limit,1048576},{used,139}]},
{run_queue,0},
{uptime,5}]
...done.
Any idea as to what causes this and how to automatically detect the situation?
Is this specifically a problem with running RabbitMQ on Windows?
Hostnames are case-insensitives when you are trying to resolve them. For example, LOCALHOST
and localhost
are the same host.
However, when Erlang constructs the name of a node (eg. rabbit@<hostname>
in the case of RabbitMQ), this name is case-sensitive. So rabbit@LOCALHOST
and rabbit@localhost
are two different node names, even if they run on the same host.
Recently, we (the RabbitMQ team) found out that, on Windows, the node name constructed for RabbitMQ was inconsistent. Therefore, sometimes, RabbitMQ started as a Windows service could be named rabbit@MYHOST
but rabbitmqctl
would try to reach rabbit@myhost
and fail.
Since RabbitMQ 3.6.0, the node name should be consistent.