Why ejabberd MUC room shows only one node in table when cluster has 2 nodes

I have a cluster with 2 instances of ejabberd.

I created few rooms through ejabberdctl with persistence true settings and the mysql is showing below copied data.

mysql> select * from muc_online_room;

+-------+----------------------------+--------------------------------+------------+
| name  | host                       | node                           | pid        |
+-------+----------------------------+--------------------------------+------------+
| Group 1 | conference.xmpp.myapps.org | ejabberd1@instance-1 | <0.158.0>  |
| Group 2 | conference.xmpp.myapps.org | ejabberd1@instance-1 | <0.125.0>  |
+-------+----------------------------+--------------------------------+------------+

What is the relevance of node in muc_room table?

I have some issues with receiving and sending messages eventhough the user was able to join the room. I am wondering if it is because the user with issues is connected to the other node (ejabberd2@instance-2) in the cluster.

I have haproxy behind the cluster.

Solution

Each ejabberd node has its own MUC service (as implemented in mod_muc.erl). And each individual MUC room (as implemented in mod_muc_room.erl) is handled by one erlang process, that is alive in an individual ejabberd node. That single erlang process that is handling a MUC room is reachable by all the MUC services in all the ejabberd nodes in the cluster, thanks to the routing information provided in the table muc_online_room.

I build a cluster of two ejabberd nodes using the internal Mnesia database. I login to an account in first node and created two rooms. Then I login to the same account, but using the second node, and joined another room.

The muc_online_room as seen in the first node is now:

ets:tab2list(muc_online_room).
[{muc_online_room, {<<"sala1">>,<<"conf.localhost">>}, <0.564.0>},
 {muc_online_room,{<<"sala2b">>,<<"conf.localhost">>}, <29502.1430.0>},
 {muc_online_room,{<<"sala3c">>,<<"conf.localhost">>}, <0.977.0>}]

As we can see, the rooms sala1 and sala3c are right now alive in the first node (the pid starts with 0.), and sala2b is alive in the other node (the pid starts with something else).

The table, as seen in the second node:

ets:tab2list(muc_online_room).
[{muc_online_room,{<<"sala1">>,<<"conf.localhost">>}, <29512.564.0>},
 {muc_online_room,{<<"sala2b">>,<<"conf.localhost">>}, <0.1430.0>},
 {muc_online_room,{<<"sala32c">>,<<"conf.localhost">>}, <29512.977.0>}]

In this second node lives the sala2b room (its pid starts with 0.), and the other two rooms are alive in the other node.

The rooms live in the node where the MUC service exists when the user joins that room.

Now I stop both ejabberd nodes, and start only the first node. All the rooms are started in that node, and the table shows:

ets:tab2list(muc_online_room).
[{muc_online_room,{<<"sala1">>,<<"conf.localhost">>}, <0.472.0>},
 {muc_online_room,{<<"sala2b">>,<<"conf.localhost">>}, <0.470.0>},
 {muc_online_room,{<<"sala32c">>,<<"conf.localhost">>}, <0.471.0>}]

You are using SQL storage, in that case you have only one database, and only one muc_online_room table, so it has a column named node. This indicates where the MUC room erlang process is alive, and its pid.

If you join a new room in the MUC service of the second ejabberd node, you will see that the new room is alive in the second node. If you then stop both nodes and start only one of them, all the rooms will be started in that node.