My setup is an Apache Httpd 2.4 server in front of 4 Wildfly 10.1 server nodes. I'm using mod_cluster for load balancing and everything is running fine most of the time. But several times each day, this appears in Apache's error log:
[Wed Mar 15 09:15:18.736665 2017] [proxy:error] [pid 18936:tid 1784] AH00940: http: disabled connection for (10.10.87.53)
[Wed Mar 15 09:15:59.955515 2017] [proxy:error] [pid 18936:tid 1784] AH00940: http: disabled connection for (10.10.87.52)
When those errors appear, users complain that they are logged out of the system. This happens because I'm using sticky session and when the errors appear, everyone seem to be moved from one node to another, which mean their sessions are lost.
However, even though those errors are in the log, the nodes are still active and working when I test them a minute later. So whatever disconnect that happened, it only happens momentarily.
Here is my Apache mod_cluster config:
LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_ajp_module modules/mod_proxy_ajp.so
LoadModule proxy_ajp_module modules/mod_proxy_http.so
LoadModule cluster_slotmem_module modules/mod_cluster_slotmem.so
LoadModule manager_module modules/mod_manager.so
LoadModule proxy_cluster_module modules/mod_proxy_cluster.so
LoadModule advertise_module modules/mod_advertise.so
<IfModule manager_module>
Listen 10.10.87.50:16666
ManagerBalancerName nmcluster
<VirtualHost 10.10.87.50:16666>
<Location />
Require ip 10.10.87
</Location>
KeepAliveTimeout 300
MaxKeepAliveRequests 0
AdvertiseFrequency 5
AllowDisplay On
AdvertiseGroup 224.0.1.105:23364
EnableMCPMReceive
<Location /mod_cluster_manager>
SetHandler mod_cluster-manager
Require ip 10.10.87
</Location>
</VirtualHost>
</IfModule>
And here is the config in Wildfly:
<subsystem xmlns="urn:jboss:domain:modcluster:2.0">
<mod-cluster-config advertise-socket="modcluster" proxies="proxy" balancer="nmcluster" connector="default">
<dynamic-load-provider>
<load-metric type="cpu"/>
</dynamic-load-provider>
</mod-cluster-config>
</subsystem>
...
<http-listener name="default" socket-binding="http" redirect-socket="https" proxy-address-forwarding="true" enable-http2="true"/>
...
<socket-binding name="modcluster" port="0" multicast-address="224.0.1.105" multicast-port="23364"/>
How can I make sticky session more sticky? Or even better, how can I stop the error from happening?
I still want users to move to another node, if a node is down, but I don't want it to happen just because a node is a little slow for a few seconds, because then it ends up doing more harm than good.
I also don't understand why these disconnects happen. Any theories would be appreciated.
It appears that when someone upload a large file that takes more than 10 seconds for the Apache server and Wildfly server to handle between them, the above "disabled connection" error occur and everyone lose their session and are logged out.
The solution is to set ping to something higher than 10 seconds in Wildfly. For example ping="60"
, like this:
<subsystem xmlns="urn:jboss:domain:modcluster:2.0">
<mod-cluster-config advertise-socket="modcluster" proxies="proxy" balancer="nmcluster" connector="default" ping="60">
<dynamic-load-provider>
<load-metric type="cpu"/>
</dynamic-load-provider>
</mod-cluster-config>
</subsystem>
Furthermore, after Wildfly has been restarted with the change, it is very important to restart Apache as well. If you don't restart Apache, mod_cluster-manager will tell you the new setting, but Apache won't be using it.